I am new to python programming so excuse me in advance if I ask something that is easily solved. I want to use MapReduce
for processing a csv
file that has some values and the output must return the maximum value.This is the script i've written so far:
from mrjob.job import MRJob
class MRWordCounter(MRJob):
def mapper(self, key, line):
for word in line.split(','):
yield 'MAXIMUM VALUE IN FILE:',int(word)
def reducer(self, word, occurrences):
yield word, max(occurrences)
if __name__ == '__main__':
MRWordCounter.run()
Now, the script works fine, it maps and reduces to the maximum value and prints it as an output but I think the way I implement it with the yield 'MAXIMUM VALUE IN FILE:'
is incorrect since the mapper always sends that string to the reducer. Can someone confirm if that is the incorrect way to implement it and recommend me how I can fix it?