The 7 necessary things to start with Big Data

With the aim of giving some criteria to the reader can reach into the realm of Big Data/Analytics.


Collect data

Drill lets talk about the other goals, the first is you have to collect the data. This job may sound easy but extremely important. The letter itself “big data” means the data is simply big, so you need to have data, that must be a lot of data. The collected data will affect the information you obtained after this.

Of course, you don’t need to keep the whole of the data in a long time, but you will not know that you can have data and any new ones needed before you start collecting data. A basic principle is the more useful data, you can analyze the many different aspects of your data.

Thankfully, we get a “kingpin” in the field of data storage and processing large data. Her name is Hadoop. Completely open source. Save everything, data from web server log format, monitor information, … to email and tweet, from unstructured to structured, not …

Once you get started with Hadoop, you will meet many other components, and you have to research more. But don’t forget the “kingpin” Hadoop.

Data Gathering into groups according to the logic

When there are data, please immediately seek to analyze this data, to see if it has anything to try them together. If they are closely related, be collected into a group, included in the bucket.

A few questions you can ask: what data has the potential to help business? Or can analyze and find out the competitive advantage? Or help you serve customers better? … After grouping and stacking priority, you will easily recognize the data that you want to analyze.

A keyword that you should know it’s Map Reduce. Try to find out about Map Reduce, if you also dream of hovering it.

Do not dispose of the present system

This is the thinking of many people when they read and know about the Big System Data, they overwhelmed the ability to handle Big Data and about the information they get. It’s much more dimensional and full of all the information they need. But, honestly, it can’t replace is the simple system that they’ve built before, with a target sales report or something.

It’s hard to give a convincing reason for this problem, but that’s not important. It is important to Big Data system to replace the current system, you have to integrate it with existing systems, and this fact must spend time and effort very much. But the benefits are what? So let’s maintain the current system and further develop the system of Big Data besides to analysis, and are only used to analyze things that the old system could not be analyzed.

Think about the use of the cloud

Instead of worrying and calculate whether you will build the infrastructure to match the processing and analysis of Big Data, use the cloud system is available the tools Map Reduce. This will save a lot of time and effort to setup, further expanding the back easily.

Currently, the large cloud system supports available Map Reduce that Amazon Web Service and Google AppEngine is an example.

Self-service providers

This is extremely important for people doing business, people who truly get big benefits when using Big Data. Let’s give them an easy to use interface, supports drag, and drop, and they can customize the height and angle of view data.

If you see strange, try to learn about Pivot Table. Also, if you want to use a complete tool always have Pentaho, Jasper, Tableau, … Most of them have a community (completely free) to you work the try, and also an Enterprise.

Think about data management (data governance)

You are doing (or think) about Big Data, be sure that your data will become giant quickly started making Big Data analysis strategy. You have two way to fix this problem: 1-save, by reducing duplication, data compression, … 2-invest in equipment to improve the storage and handling of the system. And whether you choose a way or a combination of both, then you need to think about as soon as possible.

Usually, with a normal product, you will approach towards doing a test, out a few first results, and use it to persuade people to continue to invest. The consequence is going to spend so much of the cost of conversion and redesign, or that you have to accept the use of a system is not perfect as expected.

The best way should be to think about data governance right from the start, convinced the parts business and infrastructure to build a good system, meet the needs of the analysis. And the architectural design for the most economical and appropriate.

Don’t do it alone

Read here, make sure you also understand why not do alone. Big Data is a major issue, the benefits do not bring one soon, but need to have the patience, long-term aims. The results obtained for business purposes, the analysis of the current issues, and future direction of the business. Therefore, always a consistency between the parts.

If you are a start-up? Consider carefully the order built its system with the use of the software. Don’t think building self then the price will be cheaper to rent/buy the software/service. Actually, the cost to build will be much larger.

