What is Big Data ?

Chethani Dilhari
4 min readApr 10, 2021

Data is characters, symbols and quantities which stored and transmitted as electric signals. Big data is also data but in large scale. It describes a huge collection of data which become larger and more complex with time. As an example “Facebook” generates more than 500TB of new data with photos, videos and messages in every day. Following shows that there are human generated big data and machine generated big data and they can be found in two forms. They are structured data and unstructured data. Structured data has a fixed, standard format in accessing, processing and storing. These kind of structured big data can be found in huge relational databases. Unstructured data can be found in everywhere. They are not in a structure or format. So it has become a challenge to handle them to derive value out of it. Social media data, mobile data can be taken as examples for unstructured data.

Characteristics of Big Data

There are three main dimensions to characterize big data, which is known as 3Vs. they are; (1) volume, (2) velocity and (3) variety. These characteristics can be used to define and differentiate big data. Volume is referred to size or the content of big data. It is measured in petabytes, exabytes or zettabytes. As an example, it is estimated that Walmart is producing 2.5 petabytes of customer data in every hour. And also a particular data is considered as big data by its volume. Velocity is stands for speed or rapidity of generating data. As an example Facebook users around the world upload millions of photos in a day. So these rapid data flows managed and processed to gain value. Variety means different sources and different forms of data such as e-mails, photos, sensor data, videos, audios etc. If we consider email messages, they contains text messages, and many kinds of attachments. These kind of unstructured data create issues in storing and analyzing.

Want to read this story later? Save it in Journal.

Sources of Big Data

Because of the advancement of technology, several sources of data are now available for businesses to use. Social media and other interactive platforms are the most famous sources of big data. Google, Facebook, Twitter, YouTube, Instagram acts like major parties in providing insights of consumer behaviors and their preferences. Internet is another huge source of big data which is available commonly for all. It can be accessed easily and some data sources can be accessed for free. Today, cloud computing has become very famous among companies as now they like to move from traditional databases. This cloud storage is a huge source of big data which has valuable real time information on economy and businesses. Machine generated data is also a valuable source of big data. They are taken from the sensors connected to electronic devices like computers, smartphones, video games, cameras and so many other devices. But the value of this data is depend on the accuracy and the timeliness of sensors. Databases are another popular source of big data which are deployed in large scale such as MS Access, Oracle, SQL, Amazon Simple etc. Transactional data is also an important source of big data which is highly affected to gain marketing intelligence.

Components of Big Data

There are three main components to consider for obtaining value form big data to use them for effective marketing.

A. Data Handling

Big data grows larger and larger rapidly, day by day. Unstructured data grows faster than structured data as a lot of videos, photos, posts and other things are launched every hour. It has become a huge challenge to handle those data. “Hadoop” is a major player in big data handling. It is an open source framework which used in data handling. “Google MapReduce System” is another framework which is used to handle large sets of data but not open source. Both of these technologies use MapReduce concept, dividing main tasks to sub tasks.

B. Data Analytics

Data analytics is known as data mining too. It is the technique of analyzing data to discover patterns and common characteristics in big data to use them for make value from it. There 6 main phases in data mining as below;

  1. Business understanding
  2. Data understanding
  3. data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

With advancement of technology, numerous data mining methods have invented. Classification, clustering, regression, association rule, outer detection, sequential patterns, prediction are some of them. “R-language” and “Oracle Data Mining” are example for data mining tools. They are chosen according to characteristics of data and business problem.

C. Data visualization

Data visualization is the last component of big data. It emphasize the visualizing of analyzed data using graphs or maps. Then data become more understandable and valuable. But things like audience, their view. Type of organization should be considered to have a successful data visualization. “Tag cloud”, “clustergram”, “history flow” are some famous techniques of data visualizing

📝 Save this story in Journal.

--

--