The web is becoming more dynamic, context-aware and personalized by the day, and the amount of information consumed by each person is increasing exponentially. But while hardware performance is improving, except when it comes to the simplest of parallel programming tasks, software infrastructure is not keeping pace. We need to develop new data processing architectures — ones that go beyond technologies likememcached,MapReduce,NoSQL, etc.
Think of this as asearch problem. Traditionally, there was an index of every document in which every word occurred. When a query was received the search engine could just look up the precomputed answer to which documents had which word. For a personalized search, an exponentially larger index is needed that includes not only factual data (words in a document, brand of cameras, etc.) but also taste and preference data (people who like this camera tend to live in cities, be under 40,love “Napoleon Dynamite,” etc.).
Unfortunately, personalizing along 100 taste dimensions leads to nearly as many permutations of recommendation rankings as there areatoms in the universe! Obviously there isn’t enough space to precompute what recommendations to show every possible type of person that queries a site.