Reporting — 23 June 2015

Analytics is necessary to understand data coming from Big Data databases. What’s typically meant by “Big Data” is unstructured data that can’t be managed by a traditional database because either there is too much data or the data is too complex .

Related Links
Basic Statistics for Analytics
Linear Regression
Neural Networks

Briefly, here are some concepts to know when dealing with analytics and big data.

Hadoop–This is a database designed to run over a network of computers called a Hadoop Distributed File System (HDFS). The idea is to spread the data across multiple computers so that it can scale without limit. Hadoop is designed to contain unstructured data, although there are different programming techniques to put structure on it so that the data can be processed. Unstructured data could be, for example, logs and streaming data. To say that something is unstructured means that is not every line of the log looks the same. So you cannot set it up in the traditional row, column format. Video and audio data also fits into Hadoop.

MapReduce–This is how a program pulls information from a Hadoop database. Each MapReduce program processes part of a query passed to it. These programs run across hundreds or thousands of Hadoop nodes and retrieve the data. Data is usually retrieved in the format {key, value}.

NoSQL databases–A database like MongoDB does not store data in row, column format. Instead it uses the JSON (JavaScript Object Notation). JSON records are self-describing, meaning there is no metadata. A JSON object could look like this:

{ “device” : 123,
IMEI : 123,
time : 123,

A NoSQL database, like the name suggests, does not require SQL (Structured Query Language) to query it. SQL is good for processing bank transactions and other business processes but not well-suited to analytics and big data.

Metadata–Metadata is data about data. With a traditional hierarchical file system, like a database or your PC or a computer server, the metadata of a file is limited to the following:

file name
file suffix
date created
date last modified

But with Hadoop and other more advanced file systems, you can add whatever you want to the metadata like:

processing instructions
device type
anything else

Object databases–You can put logs, database dumps, images, and videos into a Hadoop database, but an object database is better suited to images and videos only. In a Hadoop database or on your own computer, a file is addressed by its location like //disk/folder/filename.txt. That can get awkward when the directory structure is very long and complicated such as:


An object database replaces a directory structure with an object ID for a simpler way to reference it.

Also, object databases are object-oriented, which matches how Java, Python, and other programs are written. That means one object can take on different formats yet have common elements. For example, a “soccer ball” and “basketball” are both “balls.” Each has the common element “air pressure.” This approach greatly simplifies programming.


About Author

(0) Readers Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

five × = 40