I want to index the result of reducer like this :
1 "EZmocAborM6z66rTzeZxzQ"
2 "FIk4lQQu1eTe2EpzQ4xhBA"
3 "myql3o3x22_ygECb8gVo7A"
4 "ojovtd9c8GIeDiB8e0mq2w"
5 "uVEoZmmL9yK0NMgadLL0CQ"
My Python
MRJob
code :
class MRUserDic(MRJob):
count = 1
def mapper(self, _, line):
line = json.loads(line)
yield line['user_id'], 1
def reducer(self, key, values):
yield self.count, key
self.count += 1
if __name__ == '__main__':
MRUserDic.run()
But this result in:
1 "EZmocAborM6z66rTzeZxzQ"
2 "FIk4lQQu1eTe2EpzQ4xhBA"
3 "myql3o3x22_ygECb8gVo7A"
1 "ojovtd9c8GIeDiB8e0mq2w"
2 "uVEoZmmL9yK0NMgadLL0CQ"
I know that it occurs because reducers are running in different machine.
Is there any way to share count variable among reducer?