0

Structure of bag:

emp = LOAD '...../emp.csv' using PigStorage(',') AS
      (ename:chararray,id:int,job:chararray,sal:double)

This bag contains details of employees. I want to split the data based on job.

Bag = split emp into mngr if job == 'MANAGER';

This is not working & giving Error 1200.

If I include one more condition with it, for ex.- sal10k if sal<10000, then it is working. But why not only on one chararray?

I am new to hadoop pig. Know few basics. Kindly help.

Andry
  • 16,172
  • 27
  • 138
  • 246
Sanjeev
  • 17
  • 1
  • 6
  • As per this http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Case+Sensitivity, it may be the case creating problem. Try Bag = SPLIT emp INTO mngr IF job == 'MANAGER'; – Rajen Raiyarela May 25 '15 at 09:54

2 Answers2

1

I think you are using SPLIT operator wrong. This is from doc: SPLIT alias INTO alias IF expression, alias IF expression [, alias IF expression …] [, alias OTHERWISE];

So don't use this part "Bag =" at start.

Low
  • 176
  • 2
  • 4
1

Kindly find the solution to the problem below along with basic explanation about SPLIT operator:

  1. The SPLIT operator is used to break a relation into two new relations. So you need to take care of both conditions , like IF and ELSE: For instance: IF(Something matches) then make Relation1, IF(NOT(something matches) then make another relation. ( You don't have else keyword in Pig).
  2. SPLIT operation is an independent operation, meaning that you cant store the SPLIT operation in a relation:

Example: Bag = split emp into mngr if job == 'MANAGER'; // This is wrong.

You can't represent a SPLIT operation by a relation. It will execute independently on the GRUNT shell or Script like this :

*SPLIT emp INTO managers IF(job MATCHES '.MANAGER.'),not_managers IF(NOT(job MATCHES '.MANAGER.'));*

Here is an example data set and output for your reference: **

  • Dataset

**

Ron,1331,MANAGER,7232332.34
John,4332,ASSOCIATE,45534.6
Michell,4112,MANAGER,8342423.43
Tamp,1353,ASSOCIATE,34324.67
Ramo,2144,MODULE LEAD,845433.32
Shina,1389,MANAGER,8345321.78
Chin,4323,MODULE LEAD,455465.42

SCRIPT:

emp = LOAD 'stackfile.txt' USING PigStorage(',') AS (ename:chararray,id:int,job:chararray,sal:double);

SPLIT emp INTO managers IF(job MATCHES '.*MANAGER.*'),not_managers IF(NOT(job MATCHES '.*MANAGER.*'));

DUMP managers;

OUTPUT:

(Ron,1331,MANAGER,7232332.34)
(Michell,4112,MANAGER,8342423.43)
(Shina,1389,MANAGER,8345321.78)
CodeReaper
  • 377
  • 4
  • 18