Moving from words to concepts by DAM News Staff

Support ProVideo Coalition

Shop with

Filmtools

Filmmakers go-to destination for pre-production, production & post production equipment!

Shop Now

DAM News Staff

June 28, 2010

Comment

ctrl, the first product of the R&D group at PRAGMATECH, is a library for text processing that performs semantic analysis on (news-like) textual documents. The API of ctrl can be used to generate lucid summaries of documents, extract key topics, and – most importantly – it can be used to index and retrieve documents by concepts and topics as opposed to key words.

In ctrl we process and analyze concepts, not words, and this includes names of things (people, countries, organizations, products, etc.). Thus, from words we try to infer the most likely meaning in context by a process known as Word Sense Disambiguation (WSD), which is an essential part of the understanding of any piece of text.

While it is a crucial step in this process, going from words to concepts is not the end goal in and of itself since topics of interest are rarely described by single words, but are usually expressed by complex linguistic objects (mostly by nominal phrases).

Beyond going from words to concepts (meanings), therefore, in ctrl we have moved from concepts to topics (which can be thought of as compound concepts that are composed from smaller more primitive concepts).

Finally, and even if we are highly accurate in inferring the correct meaning of words in context, and even if we subsequently combined concepts into topics, what we are ultimately interested in is the set of key topics in a document – that is, what we are ultimately interested in is what is the document essentially “about”, regardless of what other words, concepts or topics are also mentioned in a certain document.

In summary, in ctrl we have moved from words to concepts, then from simple concepts to topics, and, subsequently, to key topics that represent what a certain document is about. Read more?