Combining Splitting Pig Latin

Let’s study about Combining Splitting Pig Latin.

1. Combining in Pig Latin

In Pig Latin we can merge the content of two relations using UNION operator.

Union Operator

* Union operator is used to computes the union of two or more relations.

Example

Consider two files “employee1.txt” and “employee2.txt” in HDFS directory ‘/pigempdata/’ as shown below.

Employee.txt

100,Roshan,23,HR

101,Roy,27,CS

102,Shruthi,31,IT

103,Disha,28,EC

104,Gowri,30,HR

Employee2.txt

105,Drusya,25,HR

106,manju,34,IT

Step 1: In this step will load the two text file data using “load” operator into the pig.

grunt> emp1 = LOAD ‘hdfs://localhost:9000/pigempdata/employee1.txt’ USING PigStorage(‘,’) as (id:int, name:chararray,age:int,dept:chararray);

grunt> emp2 = LOAD ‘hdfs://localhost:9000/pigempdata/employee2.txt’ USING PigStorage(‘,’) as (id:int, name:chararray,age:int,dept:chararray);

Step 2: In this step let’s merge the contents of two relations using the UNION operator.

grunt> beyod_emp = UNION emp1, emp2;

Step 3: In this step will verify the result using the DUMP operator.

grunt> Dump beyond_emp;

2. Splitting in Pig Latin

In Pig Latin using Split operator we can split the content a relation into two or more relations based on conditions.

Split Operator

* Split operator is used to Partitions a relation into two or more relations.

Example

In this example consider a file “employee.txt” in HDFS directory ‘/beyond_empdata/’ as shown below.

Employee.txt

100,Roshan,23,HR

101,Roy,27,CS

102,Shruthi,31,IT

103,Disha,28,EC

104,Gowri,30,HR

105,Drusya,25,HR

106,manju,34,IT

Step 1: In this step will load the file into pig using “load” operator.

grunt>employee_details = LOAD ‘hdfs://localhost:9000/beyond_empdata/employee.txt’ USING PigStorage(‘,’) as (id:int, name:chararray, age:int, dept:chararray);

Step 2: In this step split the relation into two relations. They are

  1. Listing the employees whose age is less than 25
  2. Listing the employees having the age between 23 and 30.
grunt>SPLIT employee_details into emp_details1 if age<25, emp_details2 if (23<age and age>30);

Step 3: In this step verify the relations emp_details1 and emp_details2 using the DUMP operator.

grunt> Dump emp_details1;

grunt> Dump emp_details2;

“That’s all about the Combining and Splitting in Pig Latin”