Why Big Data?

Ashan Sandarathna
4 min readJul 11, 2021

--

Data to Big Data- image credit: Ashan Madhuwantha

We all know that, at present the world is depending on data. And “Big Data” term has become a buzzword in 2021. Let's dive into big data by understanding it clearly.

What is Big Data?

By hearing the term of “Big”, we can imagine that its about something large and complex. To put it in words, Big Data refers to large and complex data that is impossible to process and work with usual and traditional data warehousing tools and traditional systems.

How Big Data generate?

According to International Data Cooperation(IDC) report says, the total volume of data that generated in 2020 is 64.2 zettabytes (64.2 * 10²¹ bytes). In 2025, data creation of whole world is projected to grow to more than 180 zettabytes.

© Statista 2021

Big Data is generated by machines, humans and nature. Big data generating by nature is something really special, bigger and very interesting area to research. With the improvement of technologies and services, big data generate from the various sources. They can be either structured data, unstructured data or semi-structured data.

Types of Big Data

Big data types
  1. Structured Data
    Structured data is one of the types of big data. by word “structured”, we can get an idea that it's about something formatted or organized. Structured data is about the data that can be processed, stored and retrieve in a clear and fixed format. also with structured data above actions can manageable in simple search engine algorithms. Most common examples for structured data are excel files or SQL databases.
  2. Unstructured Data
    Weather data, audio files, geo-spatial data, emails, sensor data, analytics from AI or machine learning are some examples for unstrucred data. These type of data are very difficult to handle and time- consuming to process and analyze. These kind of data which does not have any specific form or structure, we call as unstructured data.
  3. Semi-structured Data
    Semi-structured data contains both structured and unstructured data. Semi- structured data refers to the data that has not been classified under a particular repository. This also known as self-describing structure. XML and JSON are forms of semi-structured data.

The famous 5V’s

Characteristics of big data
  1. Volume
    When we think of big data, ther first thing come to our mind is the volume. This characteristic is the hardest challenge to conventional IT structures. In every second, big companies like google, Facebook, twitter, etc. collect and generate huge amounts of data.
  2. Variety
    Variety is another one of most important characteristics of big data. Variety represents the various kinds of sources where big data generates. Approximately 90% of data generated in unstructured form. In the past, data was available in spreadsheets and databases. but now a days, they are in different formats like photos, audios, videos, texts, etc. Variety of data affect the storage and the analysis. It increases the complexity of both storing and analysing big data.
  3. Velocity

Here we are at the next V. Characteristic Velocity is used to define the speed of increasing big data generating volume, speed of increasing big data proccessing and storing. Per-day in 2020,
· 293+ billion emails
· 500+ milion tweets
· 3.2+ billion likes and comments on Facebook and etc. These statics give some insight about the velocity of big data.

4. Value
We know it costs lot of money to deal with big data. because the the storing and analysing big data costs more. So, the value also an important aspect in big data characteristics. Designing and implementing the infrastructure to store and analyse big data is very hard and costy. So the potential value of big data is high.

5.Veracity
In general, veracity use to explain about the truthiness or the accuracy. Same as that in big data, veracity refers to the biases, noice and abnormality in data. When looking at to characteristics like high volume, high velocity and huge varieties of data, it's not possible to expect the 100% accuracy of data. There will be false data . The accuracy of analysed data is depend on the veracity of source data.

--

--