This is an MRJob implementation of a simple Map-Reduce sorting functionality. In beta.py
:
from mrjob.job import MRJob
class Beta(MRJob):
def mapper(self, _, line):
"""
"""
l = line.split(' ')
yield l[1], l[0]
def reducer(self, key, val):
yield key, [v for v in val][0]
if __name__ == '__main__':
Beta.run()
I run it using the text:
1 1
2 4
3 8
4 2
4 7
5 5
6 10
7 11
One can run this using:
cat <filename> | python beta.py
Now the issue is the output is sorted assuming that the key is of type string
(which is probably the case here). The output is:
"1" "1"
"10" "6"
"11" "7"
"2" "4"
"4" "2"
"5" "5"
"7" "4"
"8" "3"
The output that I want is:
"1" "1"
"2" "4"
"4" "2"
"5" "5"
"7" "4"
"8" "3"
"10" "6"
"11" "7"
I am not sure if this is to do with fiddling with protocols in MRJob as protocols are job specific and not step specific.
EDIT (Solution): I have got the answer for this one. The idea is that one needs to prepend 'O-bytes' to every number such that the number of bytes in every number is same the number of bytes in the largest number. At least that's what I remembered from my classes. I cannot add the answer right now as it won't permit me but this is the only solution I've got. If anyone's got something more transparent and easy, please share.