"Big Data" is Dead. What’s Next?

Over the last couple of years, the term "big data" has been a big hype which you can see everywhere: on the covers of the most serious scientific journals such as "Science" and "Nature"; on popular magazines such as "The Economist"; in the strategic documents of federal agencies; in the ads of top IT companies; and of course all over the web..

The emergence of the term is due to a real problem: the increasing volume, velocity, and variety of data makes it impossible to processed, analyzed, and interpreted in an efficient manner using existing tools. And this problem existed way before the term "big data" existed. 

Back in 60's many scientists were already talking about the "explosion of information" and proposing techniques to cope with it. The first collaborative organization to deal with increasing amounts of data was the "Committee on Data for Science and Technology (CODATA)" established by the International Council for Science back in 1966. 

With the emergence of very complex scientific experiments and applications especially in high energy physics, astronomy, climate modeling, and genomics, the data problem started dominating the scientific discussions over the last decade. Today even the lay person knows scientific experiments such as the Large Hadron Collider, Human Genome Sequencing, and Large Synoptic Survey Telescope, each generating petabytes of data every year. 

The worldwide adaptation of online shopping and e-commerce; web hosting for emails, videos, and images; widespread use of smartphones and mobile computing; and recent boom of social networking brought the problem to the attention of the entire world. According to some, the year 2007 was especially a break point in the history of large-scale data management, since the amount of digital information exceeded the amount of available storage during that year. 

It didn't take much time for the IT industry to convert this problem into a profit market. "Any established vendor offering a storage or analytics product for a tiny or a large amount of data is now branded as big data, even if their technology is exactly the same as it was 5 years ago." says John De Goes, the founder and CEO of Precog (a big data company) in his recent blog post on "Big data is dead. What’s next?"

John is right. Any solution on databases, data mining, data storage, is now labeled as "big data" solution. Unfortunately, this *over*-usage of the term "big data" in the industry is making this term less and less meaningful.

Similarly, everybody now call themselves as "data scientists"'. This overhyped terminology diminishes the value of the work done by people/companies that have been dealing with "real" data challenges for a while.


Fortunately, leading colleges and universities recognize the increasing need for workforce development in this area and creating new degree programs on "data science and engineering". Hopefully this new generation will have a great impact in addressing this big problem.

Personally I believe the increasing data volumes, velocity, and heterogeneity will continue to be a major challenge in all aspects of computing in the next decade, but the industry will continue to exploit and diminish the value of the term "big data".

In short, the data problem will continue to be there, but what we will call it next... we will see. 

Comments

Popular posts from this blog

Toward Sustainable Software for HPC, Cloud, and AI Workloads

Reading List for My Seminar Course on "Green Computing and Sustainability"

Spring'24 Seminar Course on Green Computing and Sustainability