Seek, And Ye Shall Find (If Your Data Is Structured Properly)

Ever thought of the amount of information found on the internet? The data available online grows with an incredible speed. The annual production of new online information is measured in exabytes (1 million terabytes).

Finding the right information within this immense quantity of data would be impossible without search engines. They continuously crawl and index the Internet to create and maintain an organized inventory of files. These search engines provide an interface to users, with which, they can easily access what they’re looking for.

The crucial requirement to finding what we want via a search engine lies within the sophisticated index structure that enables efficient storage and fast querying. It defines an organized structure on the indexed items optimized for search, which mainly stores the occurrence of index terms (words) in documents. Without this structure, retrieving the necessary information would be more difficult than finding a needle in a haystack.

Structured organization helps generally in overviewing the mass of online information. For example:

  • Websites are organized by their menu structure.
  • Webshops use product classification systems; grouping products into various categories.
  • Products and content are restructured:
    • Price comparison sites collect product and price information from a vast number of sources. They arrange them at the product level and display them in groups.
    • Mash ups perform another way of restructuring information. These are web pages or applications that use, combine and aggregate data from several sources. They then create new services to provide useful information in different views or with extra functionality.

However, even if the information is well organized, users may still have difficulties finding what they want. For example, they may not know the right search term, or are unfamiliar with the categories provided at the webshop. The structures and tools mentioned above are great, but they all miss one important point. They don’t provide any personalization to allow for a smooth customer experience.

Recommendation systems are the new generation of information organization tools which leap over these hurdles. They enable any kind of content provider to offer personalized services that deliver each user the right content according to his or her tastes and interests. These systems bring us to the term of “long tail”. Long tail refers to the act of “selling less of more”. Chris Anderson published an article about it way back in 2004 in Wired magazine and then wrote a book about this topic.

If we take online bookstores as an example, chances are they might have hundreds of thousands of different books, but their customers will only be interested in a tiny portion of their supplies, and may only be looking for one single book.

The opposite of this is when you have a few popular items and sell massive amounts of them. Imagine the book and music section of Wal-Mart or Tesco. They only have a few records and books – but they sell a lot of them. This is sort of a “vicious cycle” since it’s hard to determine whether they stock up on these items because they are popular or these items are popular because Wal-Mart sells incredible amounts of them, but this is a different topic. (More on this topic in an article by Chris Anderson.)

With long tail, what you do is offer an enormous variety of products. Products which have huge, taste dependent variety (like music, books, movies, etc) or products which previously could not be sold due to the lack of interest in that area of your store. This is what works well for iTunes,, eBay or Netflix. The number of individual items found on these sites is incredibly large. But some of these items are one of a kind.

So how can this ocean of data be tamed, and more importantly, how can customers see offers they are interested in?

Recommendation engines are the answer. They are intelligent systems that can predict user interests without requiring any additional input from them other than their normal browsing behavior. Before they even start to search for an item, a recommendation engine is preparing content they want to find.

Share this post: