I have a program that iterates a mapper and a reducer n
times consecutively. However, for each iteration, the mapper of each key-value pair computes a value that depends on n
.
from mrjob.job import mrjob
class MRWord(mrjob):
def mapper_init_def(self):
self.count = {}
def mapper_count(self, key, value):
self.count[key] = 0
print self.count[key]
# print correctly
yield key, value
def mapper_iterate(self, key, value):
yield key, value
print self.count[key]
#error
def reducer_iterate(self, key, value):
yield key, value
def steps(self):
return [
self.mr(mapper_init=self.mapper_init_def, mapper=self.mapper_count),
self.mr(mapper=self.mapper_iterate, reducer=self.reducer_iterate)
]
if __name__ == '__main__':
MRWord.run()
I defined a two-step mapper reducer, such that the first defines a class variable, self.count
. The program produces an error, AttributeError: 'MRWord' object has no attribute 'count'
. It seems each step defines an independent mrjob class object, and that variable cannot be shared. Is there another way to accomplish this?