Sorting Pig

Let’s study about Sorting Pig.

Sorting is storage of data in systematical order, it can be in ascending or descending order. Apache Pig supports Sorting operation in Pig Latin with the help two operators. As shown below.

1. Order by Operator

* Order by Operator is used to sorts a relation based on one or more fields.

Example

In this example we are assuming a file “employee_deatils.txt” in HDFS directory ‘/beyond_empdata/’ as shown below. This file contains information about the beyond employees, in this example we are sorting beyond employees based on age . Let us follow the below steps.

Employee_details.txt

100,Roshan,23,HR

101,Roy,27,CS

102,Shruthi,31,IT

103,Disha,28,EC

104,Gowri,30,HR

105,Drusya,25,HR

106,manju,34,IT

Step 1: In this step we are loading the file “employee_details.txt” into pig using “load” operator for sorting.

grunt>employee_details = LOAD ‘hdfs://localhost:9000/beyond_empdata/employee_details.txt’ USING PigStorage(‘,’) as (id:int, name:chararray, age:int, dept:chararray);

Step 2: In this step we are sorting the relation in a descending order based on the age of the employees and store it into result.

grunt> Result = ORDER employee_details BY age DESC;

Step 3: Here we are verifying the sorted result using “dump” operator.

grunt> Dump Result;

2. Limit Operator

* This Operator is used to Limits the number of output tuples.

Example

In this example we are assuming a file “employee_deatils.txt” in HDFS directory ‘/beyond_empdata/’ as shown below. This file contains information about the beyond employees, in this example we are sorting beyond employees based on age and Limit the number of output tuples. Let us follow the below steps.

Employee_details.txt

100,Roshan,23,HR

101,Roy,27,CS

102,Shruthi,31,IT

103,Disha,28,EC

104,Gowri,30,HR

105,Drusya,25,HR

106,manju,34,IT

Step 1: In this step we are loading the file “employee_details.txt” into pig using “load” operator for sorting.

grunt>employee_details = LOAD ‘hdfs://localhost:9000/beyond_empdata/employee_details.txt’ USING PigStorage(‘,’) as (id:int, name:chararray, age:int, dept:chararray);

Step 2: In this step we are sorting the relation in a descending order based on the age of the employees and store it into result.

grunt> Result = ORDER employee_details BY age DESC;

Step 3: In this step we are displaying only top five details of the beyond employee using limit operator.

grunt> limit_data = LIMIT Result 5;

Step 4: In this step verifying the limit_data using “dump” operator.

grunt> Dump limit_data;

” That’s all about the Sorting in Pig, hope this will be useful for beginners”.