Posted by Seth Grimes http://intelligent-enterprise.informationweek.com
I imagine that Google employs many hundreds of data scientists, folks whose job is to study and turn to good use the huge masses of data the search-advertising-application services giant, and its users, generate. Each document indexed, each search, each ad served, each service call creates data. This data is used to create a better Google: easier to use, faster, more accurate, effective, and functional, and yes, more profitable. Google Instant, out a couple of weeks ago, is a latest initiative toward these ends. I’m salivating (figuratively) at the thought of the new data it generates.
This article uses Google Instant as a jump-off point, so if you’re not yet familiar with Instant, the video mashup Google Instant with Bob Dylanwill show you how it works, worth 46 seconds of your time.
Googlers Alon Halevy, Peter Norvig, and Fernando Pereira wrote last year inThe Unreasonable Effectiveness of Data,
The first lesson of Web-scale learning is to use available large-scale data rather than hoping for annotated data that isn’t available. For instance, we find that useful semantic relationships can be automatically learned from the statistics of search queries and the corresponding results or from the accumulated evidence of Web-based text patterns and formatted tables, in both cases without needing any manually annotated data.