Hive DML commands

    Let us study some of the Hive DML(data manupilation language) Commands like,

    1. Hive join                           

    Hive supports four types of join they are,

    1. Inner join

    2. Left outer Join

    3. Right Outer Join

    4. Full Outer Join

    INNER JOIN

    Returns similar record values from the two tables (emp1 and emp2).

    LEFT JOIN (LEFT OUTER JOIN)

    It Returns all the values from the emp1 (left table), and the matched values from the emp2 (right table), or NULL in case of no matching join found.

    RIGHT JOIN (RIGHT OUTER JOIN) 

    It Returns all the values from the emp2 (right table), and the matched values from the emp1 (left table), or NULL in case of no matching join found.

    FULL JOIN (FULL OUTER JOIN)

    It will select all values that match either left or right table records.

    Example

    Let us consider two tables called emp1 and emp2, as shown below.

    Inner join

    hive>create hive_join

    >use hive_join

    >select a.name,a.age,a.sal,b.email

    >from emp1 a

    >join

    >emp2 b

    >on a.name=b.name;

    Note: In the above example hive_join is a database name.

    Left join

    hive> select a.name,a.age,a.sal,b.email

    >from emp1 a

    >left outer join

    >emp2 b

    >on a.name=b.name;

    Right join

    hive> select a.name,a.age,a.sal,b.email

    >from emp1 a

    >right outer join

    >emp2 b

    >on a.name=b.name;

    Full join

    hive> select a.name,a.age,a.sal,b.email

    >from emp1 a

    >full join

    >emp2 b

    >on a.name=b.name;

     2.Order by

    * Order by clause used to sort the results of a query, either Ascending or Descending.It sorts ascending by default, to specify descending use Desc.

    * It will sort all of the data together, which has to pass through one reducer. Thus, ORDER BY in hive uses single reducer.

    Example

    Assume employee table with the columns Id, Name, Salary, Designation, and Dept.

    hive> select Id, Name, Dept from employee Order by dept;

    In the above query it operates on the dept column, it arrange the data by default in ascending order.

    3. Group by

    * It is used to query a group of records.

    * This clause is used to group all the records based on result set using a particular collection column.

    Example

    Assume employee table with the columns Id, Name, Salary, Designation, and Dept.

    hive> select Dept,count(*) from employee group by dept;

    In the above query it retrieves the number of employees in each department.

    4. Sort by

    * Sort by clause performs on column of Hive tables to sort the output.

    * It will sort the data within each reducer. we can use any number of reducers for SORT BY operation.

    * Sort by clause depends on column types. i.e if it is numeric type then sort in numeric order if it is string it will sort in lexicographical order.

    Example

    Assume employee table with the columns Id, Name, Salary, Designation, and Dept.

    hive>select * from employee sort by id desc;

    In the above query it sort the id in descending order.

    5. Cluster by

    * cluster by clause is an alternative clause for both distribute BY and Sort BY clauses in HiveQL.

    It store results into multiple slices(reducer).

    * It cannot guarantee the global ordering.

    Example

    Assume employee table with the columns Id, Name, Salary, Designation, and Dept.

    hive> select id, name from employee cluster by id;

    6. Distribute by

    Distribute By clause is used to distribute the rows to different reducers based on the values of columns i.e partitioning in Map-Reduce, where all the records having same value of partition goes to the same reducer.

    * It does not guarantee clustering(ordering) the rows, based on the distributed By columns, in the reducers.

    * Columns are used in distributed by clause  to distribute the rows among reducers(i.e All rows with the same Distribute By columns will go to the same reducer).

    Example

    hive> select  id, name from employee distribute by id;