I am looking for hadoop (using Streaming and Python) to sort the outputs of the Mapper by the first two keys;
My mapper prints as follows print '%s\t%s\t%s' & (num1, num2, value)
I want my reducers to receive this data sorted by num1
and then num2
, so that these outputs:
2 1 C
1 2 A
10 3 D
1 10 B
are delivered to reducers like so (assuming we have 3 reducers):
1 2 A
1 10 B
-----------
2 1 C
------------
10 3 D
I have tried to use the mapred.text.key.partitioner.options
option setting it to -k1n,1 -k2n,2
but this doesn't seem to be working.
Any ideas?
I basically want Hadoop to perform this unix
sorting: sort -k1n,1 -k2n,2
The version of Hadoop I am using is 0.20.2
Thanks