Kindly find the solution to the problem below along with basic explanation about SPLIT operator:
- The SPLIT operator is used to break a relation into two new relations. So you need to take care of both conditions , like IF and ELSE:
For instance: IF(Something matches) then make Relation1, IF(NOT(something
matches) then make another relation. ( You don't have else keyword in Pig).
- SPLIT operation is an independent operation, meaning that you cant store the SPLIT operation in a relation:
Example:
Bag = split emp into mngr if job == 'MANAGER'; // This is wrong.
You can't represent a SPLIT operation by a relation.
It will execute independently on the GRUNT shell or Script like this :
*SPLIT emp INTO managers IF(job MATCHES '.MANAGER.'),not_managers IF(NOT(job MATCHES '.MANAGER.'));*
Here is an example data set and output for your reference:
**
**
Ron,1331,MANAGER,7232332.34
John,4332,ASSOCIATE,45534.6
Michell,4112,MANAGER,8342423.43
Tamp,1353,ASSOCIATE,34324.67
Ramo,2144,MODULE LEAD,845433.32
Shina,1389,MANAGER,8345321.78
Chin,4323,MODULE LEAD,455465.42
SCRIPT:
emp = LOAD 'stackfile.txt' USING PigStorage(',') AS (ename:chararray,id:int,job:chararray,sal:double);
SPLIT emp INTO managers IF(job MATCHES '.*MANAGER.*'),not_managers IF(NOT(job MATCHES '.*MANAGER.*'));
DUMP managers;
OUTPUT:
(Ron,1331,MANAGER,7232332.34)
(Michell,4112,MANAGER,8342423.43)
(Shina,1389,MANAGER,8345321.78)