I know that reduce task must run independently and in isolation. But for Mapper, it looks like there's a chance for mappers to communicate with each other ? If so, please explain.
1 Answers
Mappers don't communicate with each other. This was done intentionally to make sure that reliability of each map task is governed solely by the reliability of the machine where that map task is running.
See this excerpt from YDN tutorial for a better understanding :
If Mappers and Reducers had individual identities and communicated with one another or the outside world, then restarting a task would require the other nodes to communicate with the new instances of the map and reduce tasks, and the re-executed tasks would need to reestablish their intermediate state. This process is notoriously complicated and error-prone in the general case. MapReduce simplifies this problem drastically by eliminating task identities or the ability for task partitions to communicate with one another. An individual task sees only its own direct inputs and knows only its own outputs, to make this failure and restart process clean and dependable.
P.S : May I ask what makes you think the other way?

- 34,076
- 8
- 57
- 79
-
Thank you for the thorough explanation on this. I was looking at the MultithreadedMapper, where I came across this thought. – Clark Jan 29 '14 at 14:55