2

If we have a MR Job configured to run only with a single reducer it seems logical that a Partitioner need not be invoked.

However i just gave this a shot and it looks like the Partitioner is invoked even if the job is configured with a single reducer.

Any ideas why this would be required ?

Sudarshan
  • 8,574
  • 11
  • 52
  • 74

2 Answers2

2

It's because the assignment of a key/value pair to a particular reducer is the responsibility of a class playing the role of partitioner. Even if there is only one reducer you still need a partitioner to assign the key/value pairs to that one reducer.

The presence of any default values or if-there's-only-one-reducer logic effectively distributes the partition assignment behavior to places outside of the partitioner which isn't really good OO design.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
  • when you wrote "you still need a reducer to assign the" , you meant to say "you still need a **partitioner** to assign the" right ? – Sudarshan Apr 16 '14 at 05:09
0

In most cases not invoking the partitioner would be the same as invoking it even you only have 1 reducer. But what if an exception was thrown or the program crashed for another reason, not calling the partitioner could hide a bug in your program, granted this is a bit contrived because likely any bug you find in the partitioner would be easy to find anywhere else. Since there is very little cost to calling it there is no reason not to, there is really no benefit to not calling.

aaronman
  • 18,343
  • 7
  • 63
  • 78
  • I could not get what you mean by "not calling the partitioner could hide a bug in your program, granted this is a bit contrived because likely any bug you find in the partitioner would be easy to find anywhere else" could you give a example ?, But i do agree with the last line there. – Sudarshan Apr 16 '14 at 05:11
  • @Sudarshan if the partitioner is not called it could change the behavior of the program, IMO that means it should always be called – aaronman Apr 16 '14 at 16:15