Recommind productizes its categorization engine
18-Aug-2009 — CMS Watch
Anyone who’s been involved in a corporate-taxonomy project knows exactly how the terms “tedium,””tiresome,” and “taxonomy”are related. Each derives from the other.
At some point, techonology should remove the need for taxonomy projects, even if it hasn’t — yet.
Help is on the way, though — assuming you have, say, $150K (plus or minus a Toyota)to spend. Today, San Francisco-basedRecommind, Inc. (one of the vendors we cover in ourSearch & Information Access Report) is introducing MindServer Categorization, a software system that does just what its name implies:It analyzes content, discovers logical categories within the content, and auto-tags each content item according to category relatedness.
Although it’s being introduced today as a standalone product, MindServer Categorization — technically speaking — is not new. The product has been sold in Germany for years, where major media companies have used it to auto-categorize news feeds. Today’s release represents the first time MindServer Categorization has been localized into Englishandproductized for a general market (i.e., not just media firms).
Recommind is not the only company with auto-categorization technology, of course. (Autonomy, often seen on shortlists next to Recommind, is a familiar source of such technology.)But unlike others, Recommind usesPLSA (Probabilistic Latent Semantic Analysis)as a basis for category discovery, which means, among other things, that Recommind’s software requires no training:It doesn’t need to be exposed to a “training set”(or sets), have access to a preexisting taxonomy, nor know about keywords. In fact, MindServer Categorization is not only self-training butlanguage-agnostic. In theory, the underlying algorithms can discriminate categories inany corpus, regardless of what language the corpus is in.