Data

The Big Data, what is it?

Faced with the enormous amount of digital data currently in circulation, it has become necessary to develop new methods for managing and analyzing them. Big Data focuses on searching, capturing, storing, sharing, and presenting these data.

1. What about Big Data?

The literal translation of this term is mega donna or gross data. However, the term massive data seems more appropriate. Due to their quantity and volume, conventional management tools are unable to adequately process these data.

This information comes from messages sent, published videos, GPS signals, climate data, online shopping records ... The major players in the web, such as Facebook, Yahoo or Google, are the first to implement this new technology treatment.

The Big Data presents itself as a dual technical system. In other words, it is able to generate profits, but also disadvantages. Experts argue that the impact of the Big Data trend on society is considerable.

2. Analyze mass data

In addition to managing large amounts of information, Big Data designers set the goal of real-time access to databases for everyone.

The 3V rule is an essential part of Big Data:

  • The Volume concerns the considerable importance of the data to be processed;
  • The Variety relates to the different sources of this information;
  • Velocity refers to the speed of collection, creation and sharing of these data.

These 3 factors are an essential component of Big Data. They must necessarily be considered in order to manage, analyze and deal with the considerable amount of information circulating every day. Big Data presents itself as an evolution to which no one can escape.

3. Big Data related technologies

Two major families of technologies have contributed to the development of this new standard of data processing. On the one hand, the ability to store large volumes of information related to the development of cloud computing.

Then, the rise of adjusted processing technologies, such as Hadoop or MapReduce. Different solutions exist to improve processing times. To do this, it is important to opt for more efficient storage systems than SQL to analyze a greater quantity of information faster.

Parallel mass processing is also an attractive option. Combining the HDFS system, the MapReduce algorithm and the NoSQL HBase technology, the Hadoop Framework is the most representative example.