A happy quack to the reader who alerted me to Extractiv. The company is in the “content provisioning business”, and I did not know what this phrase meant. I know about “telecommunications provisioning”, but the “content” part threw me. I followed the links my reader sent me and located an interview (“Quick Q&A on Extractiv”) on the AndyHIckl.com blog. It took me about a half hour to figure out that the interviewer and the interview subject seemed to be the same person.
The key points that pierced the addled goose’s skull were:
- The service “helps consumers ‘make sense’ of large amounts of unstructured text. The method is natural language processing
- Unstructured text is transformed into structured text for sentiment tracking and semantic search
- The technology is “unique distributed computing platform makes it possible for us to crawl — and extract content from — zillions of pages at the same time. (Our performance is pretty unbeatable, too: we’re currently able to download and extract content from 1 million pages in just under an hour.)”
- “Extractiv’s a joint venture between two companies: 80Legs and Language Computer. It’s really a great match. 80Legs offers the world’s first truly scalable web crawling platform, while Language Computer provides some of the world’s best — and most scalable — natural language processing tools.