0

I am looking for hadoop (using Streaming and Python) to sort the outputs of the Mapper by the first two keys;

My mapper prints as follows print '%s\t%s\t%s' & (num1, num2, value)

I want my reducers to receive this data sorted by num1 and then num2, so that these outputs:

2   1   C
1   2   A
10  3   D
1   10  B

are delivered to reducers like so (assuming we have 3 reducers):

1   2   A
1   10  B
-----------
2   1   C
------------
10  3   D

I have tried to use the mapred.text.key.partitioner.options option setting it to -k1n,1 -k2n,2 but this doesn't seem to be working.

Any ideas?

I basically want Hadoop to perform this unix sorting: sort -k1n,1 -k2n,2

The version of Hadoop I am using is 0.20.2

Thanks

Mo.
  • 40,243
  • 37
  • 86
  • 131

1 Answers1

0

I do not answer your question, but here is a solution: if you simply concatenate num1 and num2 in your mapper output, the default sort will do the trick. Just be careful with the printed format: you need to control the number of zero before non-zero numbers (e.g. 0002 preceeds 0010, but 2 follows 10).

Yann
  • 361
  • 2
  • 7