1

Pig question,

I have my data setup the following way.

Function    Group   Home    Name
Rent    MX  1   John
Rent    MX  1   Jake
Rent    MX  1   Pat
Rent    DG  2   Jason
Rent    DG  6   Patrick
Rent    DG  6   Smith
Rent    DG  6   Joe

What I want to do is Group by function,group and home and then rank within that group.

Function    Group   Home    Name    Rank
Rent    MX  1   John    1
Rent    MX  1   Jake    2
Rent    MX  1   Pat 3
Rent    DG  6   Patrick 1
Rent    DG  6   Smith   2
Rent    DG  6   Joe 3

The RANK function in Pig does not allow me to RANK within group.Any suggestions? Jython UDF ?

JohnMeek
  • 69
  • 1
  • 2
  • 6

3 Answers3

1

Check out the Enumerate UDF in DataFu, it does this for you. http://datafu.incubator.apache.org/docs/datafu/1.1.0/datafu/pig/bags/Enumerate.html

DMulligan
  • 8,993
  • 6
  • 33
  • 34
0

I will give some pointers to this.

In Cascading API ,I used buffer which allows us to iterate the group values.

I read that cascading also has an api for Jython developers ,you may explore that.

Balaswamy Vaddeman
  • 8,360
  • 3
  • 30
  • 40
0

Ok this worked

def num_bag(input):
output = []
for rank, item in num(input):
    output.append(tuple([rank] + list(item)))
return output
JohnMeek
  • 69
  • 1
  • 2
  • 6