Big Data is the amount of data that is beyond the storage and the processing capabilities of a single physical machine. The volume, variety, varsity and velocity of data coming into your organization continues to reach unmatched levels, Big Data analytics enables organizations to analyse a mix of structured, semi-structured and unstructured data in search of valuable business information and insights.
Am trying here to explain the ‘Big Data’ theory the easiest way possible:
Let us consider three progressions. It starts with 1st progression, historically data was being generated or accumulated by workers/employees of companies who enter data into the computer system. Now comes the 2nd level progression, things evolve to the internet and now user could generate their own data. Think about Facebook, all the users are signing up and entering their data into the web themselves, that is scalable and larger than the first in order of magnitude; it has scaled up from just employees entering the data to users entering their own data. It means all of a sudden the amount of data being accumulated becomes way higher than historical. Now we can say there is a third level in this progression, because now machines are accumulating data, the buildings, streets everywhere is filled up with monitors and machines that constantly monitors various parameters like say humidity, temperature, electricity usage etc. , there are satellites around the earth that literally monitors everything 24 hours a day, taking pictures and accumulating data. Once machines starts accumulating data, the magnitude becomes higher than users. So it’s a progression from workers generating data to users generating data to machines generating data. So we can say there is colossal amount of data that is being accumulated. How does this impact? In the olden days people used relational databases to process through data I.e the data is being brought to the processor to process the data but now things have changed , we have so much of data that it overwhelms the processor and therefore a processor cannot manage processing the entire data. In today’s world multiple processors are coupled together and are being brought to the data. I.e. if you have a whole row of servers and each server has some small components of the whole data sets and you put a processor in each one and this process is called parallel processing it means the data is being processed in a bunch of different places parallelly at the same time. Before, data was being brought to the processor while now processor is being brought to the data which is tremendously much larger. Now we have this whopping amount of data and therefore we need a process which is whopping as well. This is the technological shift.
As the amount of accumulated data increases over the years, so do our storage needs. Companies are accepting that “data is king,” but how do we analyse it?
Technologies used for Data Processing :
Hadoop is an open-source software framework written in Java for developing and executing distributed applications that process very large amounts of data sets on computer clusters built from commodity hardware. Hadoop is meant to run on large clusters of commodity machines, it became very popular after companies like Yahoo and Facebook used it to analyse user interaction data.
MapReduce is a programming model for processing and generating process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
The Business benefits of Big Data:
Businesses understands that better information to tomorrow’s decision makers bring better business decisions. Big Data transforms the conventional methods of data handling to an intelligent solution by guaranteeing fragmented and automated data delivery of the right information to the right person at the right time. The transformation of business data into business intelligence is normally a costly and time consuming technical process, Big Data offers a new opportunity to better transform data from its raw form into business intelligence. Big data tools essentially facilitate data search, access, visualization and mining, delivering information faster and at a lower cost.
Google has the biggest advantage in this scenario, just think how much data is being accumulate by Google not just in the search capacity but from all the websites that they are indexing and from the other services that they offer like Hangout, Google Drive, Google Analytics, Google Adwords etc. from all this, a colossal amount of data is being accumulated. This data is priceless and it offers a huge opportunity for Google, developing new algorithms providing new profitable services which they can sell i.e. find ways to monetize this data.
Big Data is going to transform our lives, the way we live, work and think. No doubt Big Data is going to be the next Big Thing!