I have only one Reducer function per job, I agree with that. However, when I run hadoop as a simulation in NetBeans (not in distributed mode) it creates one reducer task for each unique key. For instance, If I have only 3 keys (k1,k2,k3) it will call the reduce function 3 times, one for each of these keys.
example:
Reducer: key=k1
values which correspond to k1
Reducer: key=k2
values which correspond to k2
Reducer: key=k3
values which correspond to k3
Therefore, the values which correspond to key k1 , can be accessed only from that reducer's task and the same happens for k2 and k3 values. What I want to do is to gather k1 and k2 to the same task(assuming that these two keys have something in common) so that I can access all these values (which correspond to k1 and k2 key) from only one reducer task.
In addition, I read this example and I thought that I understood it until I run it and I saw that it creates 2 reducer tasks again and not 3 which is the number of the age groups in the partitioner.
output example:
Reducer: female
Monica<tab>56<tab>92
Kristine<tab>38<tab>53
Alice<tab>23<tab>45
Nancy<tab>7<tab>98
Mary<tab>6<tab>93
Clara<tab>87<tab>72
Reducer: male
James<tab>34<tab>79
Jacob<tab>7<tab>23
Alex<tab>52<tab>69
Bob<tab>34<tab>89
Chris<tab>67<tab>97
Adam<tab>9<tab>37
Connor<tab>25<tab>27
Daniel<tab>78<tab>95