0

I am using version 0.20.0 of hadoop.

I have set the combiner class successfully. And my program runs successfully.

However, I found about 5% of my data didn't go into the combiner after coming out from the mapper, this data go into the reducer directly. I don't know why?

greedybuddha
  • 7,488
  • 3
  • 36
  • 50
JoJo
  • 1,377
  • 3
  • 14
  • 28

1 Answers1

1

A note on the implementation of combiners in Hadoop: by default, the execution framework reserves the right to use combiners at its discretion. In reality, this means that a combiner may be invoked zero, one, or multiple times. In addition, combiners in Hadoop may actually be invoked in the reduce phase, i.e., after key-value pairs have been copied over to the reducer, but before the user reducer code runs. As a result, combiners must be carefully written so that they can be executed in these different environments.

You can find this in section 2.4 of the PDF below

Data-Intensive Text Processing with MapReduce

Engineiro
  • 1,146
  • 7
  • 10