0

I have a dataset in Pig that goes like this:

Name Class Subject Marks  
Andy 1     Maths    10  
John 1     Maths    20  
Mark 2     Maths    20  
Tony 2     Geo      30  

But I need to change it to:

Name Class Maths Geo  
Andy 1      10    0   
John 1      20    0   
Mark 2      20    0  
Tony 2      0    30   

Can anyone suggest me on how to perform this within Pig?? Also, I am trying to write one Python script that will take the data and do a transpose on that. Thanks in advance :)

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
user3065910
  • 36
  • 1
  • 3
  • 1
    Are `Maths` and `Geo` the only subjects, or do you need to be able to handle more cases? – duber Apr 08 '14 at 15:51

2 Answers2

0

grouping by (Name, Class) should give you a BAG which has all the marks of an student. You can then write a simple UDF to take this BAG as input and generate the desired output.

Bharat Jain
  • 654
  • 4
  • 6
0

If your two subjects are already defined (I mean if your subjects are static), you can write this simple code to avoid writing an UDF :

A = LOAD .... AS Name, Class, Subject, Marks ...; 

B = FOREACH A GENERATE (Subject == 'Maths' ? Marks : 0) AS Maths, (Subject == 'Geo' ? Marks : 0) AS Geo, class, Name;

And if you want to aggregate your datas by Name and Class :

C = GROUP B BY (Name,class);

D = FOREACH C GENERATE group, sum(Maths) AS Maths, SUM(Geo) AS Geo;

E = FOREACH D GENERATE flatten(group), Maths, Geo;

Of course this snippet code works only if you have two defined subjects.. :)

Romain.

romain-nio
  • 1,183
  • 9
  • 25