I am beginning to learn MapReduce with the mrjob python package. mrjob documentation lists the following snippet as an example MapReduce script.
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
I understand how this algorithm generally works, what the combiner (which is not required to run) does, and how reducers run on shuffled and sorted values from the mappers and combiners.
However, I do not understand how the reducers come up with a single value. Aren't there different reduce processes running on different nodes of a cluster? How do these reduce functions come up with a single answer if only certain shuffled key-values pairs are sent to certain reducers by the partitioners?
I guess I'm confused about how the output from various reducers are combined into a single answer.