Use Hbase Hdfs RDBMS Hive

The technologies like Hbase, Hdfs, RDBMS and Hive are designed completely for different use cases, there is no battle between these technology. Let’s see an example to know the caliber of each technology.


Problem Statement

“In LinkedIn application we can find different kinds of things like profile attributes, friend list, skills, recommendations of groups, friend suggestions, recommendation of companies, profile viewers etc. This application has hundreds of millions of users. And the page loads at lightening fast speed”.

Solution is provided by making use of below technologies.


This is with respect to above problem statement.

* Hbase plays the role of database and stores the results of analytics along with other information like profile in LinkedIn application to provide fast random access. For example in LinkedIn application hbase is used to perform real time sentimental analysis using previous stored data.

Other Applications

* Facebook also uses Hbase store huge amount of data and implement their messaging platform.

* Hbase used to access a small set of data from billions of records very quickly.

* In case of semi columnar data store we can use Hbase.

* In order to perform real time data streaming and faster data retrieving mechanism we can choose Hbase.

* Hbase is suitable for applications, which require low latency reads and low latency writes.

* In case of fault tolerant in big data applications we can use Hbase.

* Hbase used to absorb incoming high velocity input stream of data.

* In case of columnar updating like to edit columnar data we can go for Hbase.


This is with respect to above problem statement.

* Hive can be used to done analytical use case of LinkedIn application. For example

1. using hive we can performs predictive analytics applications like Skill recommendation and Friends recommendation based on your skills.

2. It also perform ad-hoc analysis by data scientists for descriptive statistics in operating internal dashboards.

Other Applications

* It is used to write simple SQL queries instead of writing map reduce code to do heavy data crunching (automated processing of large amounts of data and information) on big data sets lying on HDFS.

* Hive also used as an ETL tool to do batch insertion into Hbase.

* In case of fault tolerant in big data applications we can go for Hive.

* Hive is used has data ware house tool to store and process structured data.

* In case of ad-hoc data analysis we can use Hive.


This is with respect to above problem statement.

* Here Hdfs is used for storage and processing of LinkedIn huge data. For example

LinkedIn is the largest social network for professionals it has more than 400 million profiles (122 million in US and 33 million in India) across 200+ countries, more than 100 million unique monthly visitors, 3 million company pages, 2 new members joining the network every second, 5.7 billion professional searches all these data is stored in Hdfs.

Other Applications

* In order to performing batch analytics we can use HDFS. For example

data is stored over a period of time (months or years) and those data is used for batch analysis in order to predict the business graph in e-commerce companies.

* Hdfs is capable of processing structured, semi-structured as well as un-structured data. For example

1. Social media data is a un-structured data, generated from the social media platforms like YouTube, Facebook, Twitter, LinkedIn, and Flickr.

2. semi-structured data like CSV, XML and JSON documents generated feom web.

* In case of write-once and read-many times use cases application we can use HDFS. For example in E-commerce company like flipkart, amazon and e-bay they dump previous years data into hadoop, to perform sentimental and mood analysis of customers based on seasons they read data many times to get profit out of it.


This is with respect to above problem statement.    

* RDBMS is used for payment gateways for its high consistency & availability nature in LinkedIn application.

Other Applications

* RDBMS provides data safety from the program crash.

* It provides concurrent access and fault tolerance.

* RDMS can used to handle small or large quantities of data in a uniform manner.

* In order to generate arbitrary reports we can use RDBMS.

“That’s all about this article, we can say that each technology is having its own caliber by using all to gather in the back end we are enjoying the LinkedIn application”.