What is Big Data?

May 6, 2020
2 min read

Updated: Aug 7, 2023

Ever counted all the likes you left on Facebook, or all the comments you left on LinkedIn? Wondered how many tweets you ever responded to? The device that you hold while doing all those activities, your smartphones, your computers, there are a million of those out there and all of them hold trillions of GBs of data, from texts to media files to your medical reports and purchase history on Amazon.

All of this was not getting captured a few years back and this is what makes big data different from any other data that we have ever dealt with in the past.

Pursuing analytics strategically will create an unprecedented amount of information of enormous variety and complexity. This is leading to a change in data management strategies known as big data. - Sondergaard

The characteristics of big data which is critical for insight generation is dictated by the four V’s.

The Four Vs

Volume

The overall amount of data generated each day is rising exponentially. Experts say that the amount of data generated in the last two years is more than what has been generated before that throughout human history. It is also projected that 2.3 trillion GB data is generated each day.

Velocity

The frequency with which the data is generated is also increasing each day. Many reports published on the kind of data generated in an internet second show mind-boggling numbers.

In an internet second, more than 50K Google searches are completed, more than 125K YouTube videos are viewed, 7K tweets are sent out, and more than 2 million emails reach the inboxes. The flow of data is huge and constant, which can help researchers and companies make valuable decisions.

Variety

The endless variety of data is more impressive than its sheer volume. The diversity is not only regarding devices or sources of data generation but also the type of data generated.

Data is generated through smartphones, laptops, tablets, fitness trackers, smart watches and many other sources. The data collected is both structured & unstructured. Social media platforms like Twitter, Facebook, and Instagram are the most substantial sources, producing more data than any other communication tools.

At present, data scientists are more inquisitive about unstructured data, which can be in the form of social media activity, likes, comments, audio, video, GIFs or other media files. Using machine learning techniques and natural language processing, data scientists can understand customer behaviour.

Veracity

Veracity refers to the quality of data being analysed. High veracity data has many records that are valuable to analyse and that contribute in a meaningful way to the overall insights. Low veracity data, on the other hand, contains a high percentage of meaningless data. The non-valuable in these data sets is referred to as noise. An example of a high veracity data set would be data from a clinical trial.

Data that is high volume, high velocity and high variety must be processed with advanced tools and algorithms to reveal meaningful information. We at Supl.ai Analytics can help you drive business value by setting up the analytics pipeline and generating actionable insights.

Supl.ai