0

When using one of the aggregator function in place of a reducer, will Hadoop be smart enough to use same function for combiner phase?

greedybuddha
  • 7,488
  • 3
  • 36
  • 50
spacemonkey
  • 19,664
  • 14
  • 42
  • 62
  • You have to directly specify your combiner. So if you have to set it, what is there to be smart about? I must be missing something – greedybuddha May 14 '13 at 22:51
  • well by smart I mean, if I skip `-combiner` property, will it perform combine phase based on the aggregator function? Or do I need to specify it like `-combiner aggregate -reducer aggregate`? – spacemonkey May 15 '13 at 18:56
  • are you talking about cascade aggregators? – greedybuddha May 15 '13 at 19:21
  • what I am trying to figure out is whether `-combiner aggregate -reducer aggregate` is same as simply just `-reducer aggregate`, because maybe Hadoop is smart enough to optimize cases like that itself? – spacemonkey May 16 '13 at 00:04

1 Answers1

0

They fulfill partly the same purpose but the aggregator is more generic and can be used in cases where the combiner can not.

So it's a definite no, the aggregator will not be used as a combiner automatically. If you want it to be used as a combiner you will have to specify it as such.

Quote from cascade, "Combiners are limited to Associative and Commutative functions only, like 'sum' and 'max'. And in order to work, values emitted from the Map task must be serialized, sorted (deserialized and compared), deserialized again and operated on"

greedybuddha
  • 7,488
  • 3
  • 36
  • 50