Some information about currently Big Data
According to Intel documents in September 2013, currently, the world are creating 1 petabyte of Data every 10 seconds and it is equivalent to a 13-year-long HD video. The company, the business also owns Big Data of their own, such as the eBay online sales page then use two Data centers with a capacity of up to 40 petabytes to contain those queries, search engines, recommending to the customer as well as information about his cargo.
The online retailer Amazon.com to handle millions of daily activities as well as requests from about half-million sales partners. Amazon uses a Linux system and in 2005, they each own the three biggest Linux Database in the world with a capacity of 7, work with, 18, 24, and 5TB 7TB.
Similarly, Facebook must also manage 50 billion shots from users who upload to YouTube, or Google to save most of the weekly query and the user’s video and many types of other related information.
Also under the SAS group, we have a few interesting statistics about Big Data as follows:
The RFID system (a form of short-range connections, such as the NFC but has more range and is also used in the opening tag of the hotel) to create the amount of Data is greater than 1,000 times compared to traditional VAC code
Within 4 hours of the day “Black Friday” 2012, Walmart stores had to handle more than 10 million cash transaction, i.e. account 5,000 interfaces per second.
UPS courier service receives approximately 39.5 million requests from his customers every day
VISA service handles over 172,800,000 card transactions only within a day.
On Twitter there are 500 million new tweet stream every day, Facebook had 1.15 billion members created a huge tangle of text Data, files, videos …
The technology used in Big Data
Big Data is growing demand large that Software AG, Oracle, IBM, Microsoft, SAP, EMC, HP, and Dell has spent more than 15 billion DOLLARS to companies specializing in Data analysis and management. In 2010, the Big industry Data worth more than 100 billion dollars and is growing fast with a speed of 10% per year, twice the total software industry in General.
As said above, Big Data need to harness information technologies is very special because of the huge and complex nature of it. In 2011, the Group proposed the McKinsey analysis technology that can be used with Big Data include crowdsourcing (leverage resources from multiple computing devices worldwide to jointly handle the Data), the gene and genetic algorithms, machine learning methods (note only the system has the ability to learn from the Data , a branch of artificial intelligence), natural language processing (like Siri or Google Voice Search, but more advanced), signal processing, simulation, time series analysis, modeling, strong server combined together. … This technique is very complicated so we’re not going to say about them.
In addition, the Database supports parallel Data processing, applications based on active search operations, file system form discrete, cloud systems (including applications, computational resources as well as the storage space) and the Internet itself is also effective tools for research and extract information from “big Data”. Currently, there is also Database relation (table) format capable of petabyte Data rows, they contain can also upload, manage, backup and optimization using Big Data.
Those who work with Big Data often see them feeling uncomfortable with the Data storage system for slow speed, so they prefer the kind of drives can store attached directly to your computer (as well as the hard drive mounted in our computers so). That drive may be the SSD SATA disks for the storage grid is located in a large size. These people look to NAS or SAN network storage system with the perspective that these things are too complicated, expensive and slow. The above properties are not suitable for systems used to analyze Big Data which is targeted at high-performance, leverage common infrastructure, and low cost. In addition, the analysis of Big Data also need to be applied in real time or real time access, so the latency need to be removed anytime and anywhere possible.