Big Data - What's the Big Deal? by Jose

Part 1 of 2

July 25, 2012

BIG DATA is more than just the fancy new buzzword, though the term has been around for a very long time. As a matter of fact, it’s been used in many different ways recently, creating a lot of confusion. Traditionally reserved for analytics applications, it now extends into other areas of extreme content growth. It can come in the form of “human information” – meaning person-to-person communication like emails, videos, phone calls, voicemails and social media updates, as well as extreme data sets filled with information generated through sensors, tags, GPS signals, etc. Big data presents some very real organizational barriers – but also some good opportunities.

According to IBM, we create 2.5 quintillion (or 2.518) bytes of data every day with 90 percent of the data in the world created in the last two years alone.
What does this mean in context? For example, let’s say every hour Walmart handles one million transactions, feeding a database of 2.5 petabytes, which is almost 170 times the data in the Library of Congress. The entire collection of the junk delivered by the U.S. Postal Service in one year is equal to five petabytes, while Google processes that amount of data in just one hour. The total amount of information in existence is estimated at a little over something called a “zettabyte”, yet the U.S. Office of Weights and Measures has coined a term for the amount of data too big to yet imagine – a “yottabyte”. (Yes, I had to look up new terms and do a little research because I am still trying to wrap my head around this!)

Recently, Joel Richard of the Smithsonian gave a incredible presentation on “Implementing a Linked Open Data Set” at this past week’s SLA Conference in Chicago. His presentation focused on some actual linked data sets and how it has grown over the past five years. As he mentioned in his presentation, back in 2007 there were only a handful of data sets, according to Richard Cyganiak’s “searching.” Between 2009 and 2010 the number of items doubled. As of September 2011 there were 295 data sets listed. There are probably more being added every day.

Organizations across all industries globally are struggling with how to retain, aggregate and analyze this mounting volume of “BIG DATA”. Interactivity is driving big data, with people and machines both consuming and creating it. Joel gave an example of what the Smithsonian is doing, having technology do the heavy lifting with data and having humans clean it up. 99.7 percent accuracy with technology. To me, that is astounding!

Digital companies focused on becoming good at aggregating and analyzing the data created by the end users of their products, who then provide their customers with solid insights taken from that data are at a distinct competitive advantage over others in the marketplace. Big data is not just unlocking new information but new sources of economic and business value.

Linked data is a huge opportunity for a Digital Asset Management System. But how do we tame it? While I do not have the answers, it is clear that some of this BIG DATA should definitely be a part of your DAMS, not just for monetizing, but also for long-term archival strategies.