I have some data coming out from the reducer which are like this :
9,2 3
5,7 2
2,3 0
1,5 3
6,3 0
4,2 2
7,1 1
And I would like to sort them according to the number on the second column. Like this :
2,3 0
6,3 0
7,1 1
5,7 2
4,2 2
1,5 3
9,2 3
When I run my program locally, I use :
sort -k2,2n
But I don't know how to do the same thing on Hadoop. I've tried several option which are not working, such as :
-D mapreduce.partition.keycomparator.options=-k2,2n
And moreover, I would like that all the data which have the same key to go on the same reducer. So in this case :
2,3 0
and
6,3 0
should be processed by the same reducer.
Any ideas of the option I should put on hadoop ?
Thank you in advance !