Big Data is a term used to describe a set of data so vast and so complex that the tools data processing applications, the tradition could not handle. However, Big Data back containing a lot of valuable information that if extracted successfully, it will help a lot for business, scientific research, predicted the impending epidemic arose and even determining traffic conditions in real time. Thus, these data must be collect, organize, store, search, share in a different way than normal. In this article, invite you along to learn about Big Data, the method that is used to exploit it and it helps for how our lives.
The definition of Big Data
As said above, Big Data (“data”) is the set of data that are exceeded ensure of the traditional tools and applications. The size of the Big Data are ever on the increase, and count up to 2012, it can range from a few dozen terabytes to a petabyte (1 petabyte = 1024 terabytes) only for a set of data only.
In 2001, analyst Doug Laney of META Group (now the primary research company Gartner) said that the challenges and opportunities lie in the growth data can be described by three dimensions: the increase of the amount (volume), an increase of velocity (velocity) and the increase of varieties (variety). Now, Gartner, along with many other organizations and companies in the field of information technology continues to use model “3V” to the definition should Big Data. Come 2012, Gartner added that Big Data in addition to the three properties on the left must “need to handle new forms to help make decisions, explore deeper into things/events and optimizing work processes”.
We can take the experiments of the large Hadron Collider (LHC) in Europe as an example for Big Data. When these experiments are conducted, the results will be recorded by 150 million sensors with data task about 40 million times per second. The result is as if LHC noted most of the results from all of the sensors, the data flow will become extremely large, could reach 150 million petabytes each year, or 500 exabytes per day, 200 times higher than all the other data source on the world pooled type.
Additional information on Big Data
In every second like back there to about 600 million collisions between particles, but after sifting back from about 99.999% of the data flow, only 100 are collision range scientists concerned. This means that the governing body must find new measures LHC to manage and handle most of the giant data tangle.
Another example, when Sloan Digital Sky Survey, a space observatory in New Mexico, began operations in 2000, after a few weeks it was collecting the data is larger than the total amount of data that astronomy has been collected in the past, about 200 GB each night and currently total reached more than 140 terabytes. LSST Observatory to replace the SDSS is expected inauguration in 2016 will collect an equivalent amount of data as above but only within 5 days.
Or as the work of deciphering the human genetic. Before this work take up to 10 years to process, and now people just a week is completed. Also, the center of the NASA climate simulation is contained 32 petabyte data about weather-observation and simulation in supercomputers. The storage of images, text and other multimedia content on Wikipedia as well as noting the user’s editing behavior also constitutes a set of Big Data.