The statistically improbable phrase
I have a new favorite catchphrase this week, provided by a unique search function within Amazon.com. It turns out, now that the on-line retailer has access to the inside content of many of its books, it’s been adding more elaborate search functions to its service.
One of the more interesting is the algorithm for ‘statistically improbable phrases,’ or those unique combinations of words that appear in published works. According to Amazon:
Statistically Improbable Phrases, or ”SIPs”, are the most distinctive phrases in the text of books in the Search Inside! program. To identify SIPs, our computers scan the text of all books in Search Inside. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside books, that phrase is a SIP in that book.
These statistical searches are intended to find catchphrases, core ideas, or unusual concepts that might help someone find a book, or other books like the one they have already found. For example, some SIPs for Malcolm Gladwell’s new book Blink include rapid cognition, intuitive repulsion, and adaptive unconscious (thanks to onfocus for the links). You could also look for general concepts and current hot topics such as the creative class or the attention economy.
On the more creative side, if I wanted to find books that reference a favorite poem (such as James Wright’s ”Autumn Begins in Martins Ferry, Ohio”), I could enter the most unique phrase from that work, ”suicidally beautiful,” and see what pops up.
I’m not yet sure what this has to do with cultural management. But something powerful is happening to the way we discover and engage with creative works, as more of them become poured into the great big bucket of content we call the Internet.