Is there any way to sort the output of reducer function using mrjob?
I think that the input to reducer function is sorted by the key and I tried to exploit this feature to sort the output using another reducer like below where I know values have numeric values, I want to count number of each key and sort keys according to this count:
def mapper_1(self, key, line):
key = #extract key from the line
yield (key, 1)
def reducer_1(self, key, values):
yield key, sum(values)
def mapper_2(self, key, count):
yield ('%020d' % int(count), key)
def reducer_2(self, count, keys):
for key in keys:
yield key, int(count)
but it's output is not correctly sorted! I suspected that this weird behavior is due to manipulating int
s as string
and tried to format it as this link says but It didn't worked!
IMPORTANT NOTE: When I use the debugger to see the order of output of reducer_2
the order is correct but what is printed as output is something else!!!
IMPORTANT NOTE 2: On another computer the same program on the same data returns output sorted as expected!