I'm new to MRJob and MR and I was wondering in the traditional word count python example for MRJob MR:
from mrjob.job import MRJob
class MRWordCounter(MRJob):
def mapper(self, key, line):
for word in line.split():
yield word, 1
def reducer(self, word, occurrences):
yield word, sum(occurrences)
if __name__ == '__main__':
MRWordCounter.run()
is it possible to store the word, sum(occurrences)
tuples into a dictionary instead of yielding them, so I can access them later? what would be the syntax to do this? Thanks!