Apache HBase Introduction

Let’s study about what is Hbase and why Hbase is required in IT field.

What is Hbase?

Hbase is a column-oriented database management system and runs on top of Hadoop Distributed File System (HDFS) and It is an open source NoSQL (non-relational) database. HBase is implementation based on Google’s BigTable paper

Just to recall Column-oriented is a DBMS (database management system) that stores data tables in column format rather than row format.

Introduction

We all know that Bigdata Hadoop is a hot cake in the IT field, because it provides solution for the storage and processing of the huge data set, range from TB (Tera byte) to PB (Peta byte). When comes to the processing Hadoop performs only batch processing, and data is accessed only in sequential manner. i.e even for the small task we need to search entire data set, it was time consuming.

In order to overcome the Limitation of Hadoop, Hbase came into existence as Random Access Database.

Hbase can be used when we are working with real time data streaming, Hbase concept is implemented based on google’s Big table paper.

Random Access is the ability to access any item from huge data set very easily and efficiently, no matter how many elements may be in the set. It is typically opposite to sequential access. Shown in the below diagram.

Why Hbase is required?

Hbase is very essential for the following reasons,

* The main reason behind the using of Hbase is, random access of real-time data to perform read/write operation on large data sets.

* It is suitable for Online Analytical Processing (OLAP).

* Hbase is mainly designed for huge tables for real-time querying (without delay it will respond).

* Hbase is required in key based storing and retrieving of massive amount of data. that’s why data access rate is faster in Hbase due to key-value pair format.

* Hbase can be useful for unstructured data access.

* Hbase is used in case of time series data, below are the example of time series data.

* If your application has a variable schema where each row is slightly different, then you should look at HBase

Which and Where companies are using Hbase?

1. Facebook is using Hbase to store huge data set generated by Facebook Messenger and it also stores metrics generated by VM (virtual machine). metrics could be server logs like CPU %, ram usages, network load and etc.

2. In banking sectors also adopting Hbase to store time series of financial data, like user what are menus are using mostly while they are using the net banking and to prevent the credit card fraud.

Reference

https://en.wikipedia.org/wiki/Apache_HBase

“That’s all about the basic introduction of Apache Hbase, I hope this article gave basic idea about Hbase”.