6

How do I find the MAX of a tuple in Pig?

My code looks like this:

A,20
B,10
C,40
D,5

data = LOAD 'myData.txt' USING PigStorage(',') AS key, value;
all = GROUP data ALL;
maxKey = FOREACH all GENERATE MAX(data.value);
DUMP maxKey;

This returns 40, but I want the full key-value pair: C,40. Any ideas?

user7337271
  • 1,662
  • 1
  • 14
  • 23
supyo
  • 3,017
  • 2
  • 20
  • 35

2 Answers2

7

This works with Pig 0.10.0:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key, value: long);
A = GROUP data ALL;
B = FOREACH A GENERATE MAX(data.value) AS val;
C = FILTER data BY value == (long)C.val;
DUMP C;
Frederic
  • 3,274
  • 1
  • 21
  • 37
  • Just a heads-up: while computing 'C' data should be filtered by B.val instead of C.val – Zibi Jul 15 '15 at 10:37
  • I'd like to second @Zibi. The last declaration should be `C = FILTER data BY value == (long)B.val;`, not `C = FILTER data BY value == (long)C.val;`. Thanks for the solution @Frederic Schmaljohann. That worked for me on Pig 0.10. – TheWalkingData Mar 08 '16 at 21:23
3

Try this:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key: chararray, value: int);

sorted = ORDER data BY value DESC;

limited = LIMIT sorted 1;

projected = FOREACH limited GENERATE key;

DUMP projected;
Ruslan
  • 3,063
  • 1
  • 19
  • 28