Apache Hive Architecture and Apache Hive Mode

Apache Hive Architecture and Apache Hive Mode are the important topic, lets discuss that.

Apache Hive Architecture

Apache Hive Architecture

Major components of the Apache Hive architecture are:

  1. Metastore
  2. Driver
  3. Compiler
  4. Optimizer
  5. Executor
  6. CLI, UI, and Thrift Server
  • Metastore
  1. Stores metadata of the tables such as their schema and location.
  2. It also includes the partition metadata which helps the driver to track the progress of various data sets over the cluster.
  3. The metadata keeps track of the data, replicates the data and provides a backup in case of data loss.
  • Driver
  1. Acts like a controller which receives the HiveQL statements.
  2. It executes the statement by creating sessions and monitors the life cycle.
  3. Stores the necessary metadata generated during the execution of an HiveQL statement.
  4. The driver also acts as a collection point of data i.e query result obtained after the Reduce operation
  • Compiler
  1. Performs compilation of the HiveQL query, which converts the query to an execution plan
  2. Compiler performs MapReduce on Execution plan(contains the tasks and steps) to get an output.
  3. The compiler converts the query to an abstract syntax tree (AST). After checking for compatibility and compile time errors, it again converts the AST to a directed acyclic graph (DAG).
  • Optimizer
  1. It apply transformations on the execution plan to get an optimized DAG.
  2. optimizer apply transformations to converting a pipeline of joins to a single join, for better performance.
  3. It can also split the tasks i.e apply transformation on data before a reduce operation, to get better performance and scalability.
  • Executor
  1. executor executes the tasks
  2. It interacts with the job tracker in Hadoop to schedule tasks to run.
  3. It takes care of pipelining the tasks.
  • CLI, UI, and Thrift Server
  1. A command-line interface (CLI) provides a user interface for an external user to interact with Hive by submitting queries.
  2. Thrift server allows external clients to interact with Hive over a network.
  3. User interface (UI) provide the execute interface to the driver.

 

Apache Hive Mode                                  

Apache Hive mode can be operated based on the size of data nodes in Hadoop.

  1. Local mode
  2. Map reduce mode 

Local mode

  • Local mode is used when ,Hadoop is installed under pseudo mode with single data node.
  • Single local machine with small data size uses local mode.
  • In local machine processing speed is very fast due to smaller data sets.
  • In order to work with hive local mode we need to set
 SET mapred.job.tracker=local;

Map reduce mode

  • Map reduce mode is used when ,Hadoop is installed under multiple data nodes and data is distributed across different node.
  • It is used when large amount of data sets and query need to execute in parallel way.
  • This mode is used to achive better performance over large data sets.

Note: By default Apache hive can works on Map Reduce mode and for local mode we need to follow the above setting.

 

References for Apache Hive Architecture and Apache Hive Mode

https://en.wikipedia.org/wiki/Apache_Hive

 

That’s all about the Apache Hive Architecture and Apache Hive Mode, lets move on further topics.