What is Apache Hive Introduction

What is Apache Hive Introduction?

Apache Hive Introduction is an open source, ETL(extract Transfer Load) and Data warehousing tool, build on top of Hadoop Distributed File System (HDFS) used for analyzing structured and semi-structured data. It makes querying(writing queries) and analyzing very simple.The queries written in hive is called as HQL (Hive Query Language) is similar to SQL.

Apache Hive supports,

  1. Data Definition Language (DDL)
  2. Data Manipulation Language (DML)
  3. User Defined Functions (UDF).

Hive make job easy by performing some of the operations like,

  1. Data encapsulation(data is hidden from the user)
  2. Ad-hoc queries(query created when need arises)
  3. Analysis of huge datasets

Apache Hive Introduction

Apache Hive is a data warehousing tool in the Hadoop Ecosystem. Facebook developed hive, later the Apache Software Foundation developed it further as an open source project called Apache Hive.

The main objective behind the development of Hive is to make simple path for SQL developers and analyst, i.e It is a SQL type language does not required any  programming skill, even non programming background person can also learn very easily for querying and analyzing data in Big Data. It also reduces the work of programmers i.e reduces lines of code. Many companies are using Apache hive for its benifits as explained above. For example, Amazon uses it as Amazon Elastic MapReduce, IBM, Yahoo, Netflix, Financial Industry Regulatory Authority (FINRA) etc.

Problem overcome by Apache hive

Initially Facebook was using traditional RDBMS gradually size of data being generated increased, RDBMS could not able to handle huge amount of data, so to overcome this problem, Facebook initially using MapReduce but programming is very difficult, later it found a solution called Apache Hive. On regularly daily basis it loads 15TB of data.

Apache hive can execute thousands of jobs on the cluster with hundreds of users, for a diffrent variety of applications.

Apache Hive can performs,

  1. Schema flexibility and evolution.
  2. Partioned and bucketing in hive tables
  3. Apache Hive tables can defined directly in the HDFS
  4. JDBC/ODBC drivers are available

Note:Above three points are cover in the upcoming topics.

References for what is Apache Hive Introduction