Features and limitations of Apache hive

Features of Apache hive

Let us discuss features of Apache Hive one by one

  1. Apache Hive provides data summarization, query, and analysis in much easier manner.
  2. It stores schema in a database and processed data into HDFS.
  3. It support OLAP(Online Analytical Processing).
  4. Apache Hive can mange low-level interface requirement of Hadoop perfectly.
  5. Hive Partitions and bucketing of data in the tables improve the performance.
  6. It is a rule based optimizer (set of rules followed in query execution) to get expected result.
  7. It is scalable, familiar, and extensible. i.e working with huge volume and variety of data, without affecting performance of the system.
  8. Hive supports client application(application runs on the work station or personal computer) written in Java, PHP, Python, C++ and Ruby.
  9. It is as an efficient ETL (Extract, Transform, Load) tool.
  10. Working with HiveQL does not require any knowledge of programming language, Knowledge of basic SQL query is enough.
  11. We can easily process structured data in Hadoop using Hive.
  12. We can also run Ad-hoc queries(loosely typed command/query whose value depends upon some variable) for the data analysis using Hive.
  13. It can be used for Data Visualization and Apache Tez(integration with Hive) will provides real time processing capabilities.
  14. Supports to works on the server side of a cluster.
  15. Apache hive used in some of the area like,
  • Log processing(system or network log)
  • Text mining(deriving high-quality information from text)
  • Business analytics( investigation of past business, performance to gain profit)
  • Predictive modeling (uses statistics to predict outcomes)
  • Document indexing(organizing and storing documents for later use)
  • Data mining( analyzing a large amount of data in a database)

 

Limitations of Apache Hive

Some of the limitations of Apache Hive are as follows:

  1. Apache hive does not offer real-time queries and row level updates.
  2. Latency of Apache Hive queries is generally very high.
  3. Limited subquery support.
  4. No support for materialized view.
  5. update or delete operations are not supported in hive.
  6. Not designed for OLTP(online transitional process).

References

https://en.wikipedia.org/wiki/Apache_Hive#Features