Meaning = Data + Structure: User Generated Structure by DAM News Staff

October 25, 2007

Meaning = Data + Structure: User GeneratedStructure October 24, 2007

Posted by jeremyliew in Lightspeed Venture capital I’ve been thinking about how the explosion of user generated content that has characterized web 2.0 can be made more useful by the addition of structure, ie meaning = data + structure.

The obvious way that structure can be added to user generated content is by asking users to do it – user generated structure.

There are at least four ways that I can think of to get at user generated structure.

Tagging is the first approach, and its use has been endemic to web 2.0. Sometimes the tagging is limited to the author of the content, and other times any user can add tags to create a folksonomy. Most if not all social media companies employ some form of tagging, including Flickr for photos, Stylehive for fashion and furntiure (Stylehive is a Lightspeed company) and Kongregate for games. Tagging is a great first step, but with well known limitations, as Wikipedia notes:

Folksonomy is frequently criticized because of its lack of terminological control that it seems to be more likely to produce unreliable and inconsistent results. If tags are freely chosen instead of taken from a given vocabulary, synonyms (multiple tags for the same concept), homonymy (same tag used with different meaning), and polysemy (same tag with multiple related meanings) the efficiency of indexing and searching of content is lower.[3] Other reasons for inaccurate or irrelevant tags (also called meta noise) are the lack of stemming (normalization of word inflections) and the heterogeneity of users and contexts.

The second approach is to solicit structured data from users. Examples of sites that do this include wikihow (which breaks down each how to entry into sections such as Introduction, Steps, Tips, Warnings and Things You’ll Need), CitySearch (which asks you for Pros and Cons and for specific ratings on dimensions such as Late Night Dining, Prompt Seating, Service and Suitability for Kids) and Powerreviews (which powers product reviews at partner sites that prompt for Pros, Cons, Best Uses and User Descriptions, including both common responses as check boxes and a freeform text field with autocomplete).

This can be a powerful tool to add structure to data, but as one of the commentors to my last post points out,

From a UGC perspective, site administrators can force structure by requiring every site contribution to have a parent category, or descriptive tags. The problem is that the more obstacles you put in place before content can be submitted, the less participation you are going to get.

The third approach to user generated data is the traditional approach to the Semantic Web. As Alex Iskold notes in ReadWriteWeb:

The core idea is to create the meta data describing the data, which will enable computers to process the meaning of things. Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems. For example, as John Markoff describes in his article, a computer will be able to instantly return relevant search results if you tell it to find a vacation on a 3K budget.

In order for computers to be able to solve problems like this one, the information on the web needs to be annotated with descriptions and relationships. Basic examples of semantics consist of categorizing an object and its attributes. For example, books fall into a Books category where each object has attributes such as the author, the number of pages and the publication date.

Ideally, each web site creator would usa an agreed format to mark up the meaning of each statement made on the page, in a similar way that they mark up the presentation of each element of a webpage in HTML. In a subsequent article, Iskold also notes some of the challenges with a bottom up approach to building the Semantic web which can be summarized at a high level as “it’s too complicated” and “no one wants to do the work”.

The fourth approach to user generated structure is to build a central authority of meaning. Metaweb appears to be trying to do this with Freebase, a sort of “Wikipedia for structured data” which describes itself as follows:

Freebase is an open database of the world’s information. It’s built by the community and for the community – free for anyone to query, contribute to, build applications on top of, or integrate into their websites.

Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations – all reconciled and freely available via an open API. This information is supplemented by the efforts of a passionate global community of users who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients.

By structuring the world’s data in this manner, the Freebase community is creating a global resource that will one day allow people and machines everywhere to access information far more easily and quickly than they can today.

Continued