2

While learning about MapReduce, I encountered this question:

A given Mapreduce program has the Map phase generate 100 key-value pairs with 10 unique keys.

How many Reduce tasks can this program have when at least one Reduce task will certainly be assigned no keys when a hash partitioner is used (select all answers that are correct)?

  • [ ] A. 3
  • [ ] B. 11
  • [ ] C. 50
  • [ ] D. 101

The answers are B, C, D.

Since the unique keys' number is 10. We must have at least 10 reduce task. And at least one reduce task has null key.

I am not able to understand how these answers where arrived at. Please help me in this.

Nickolay
  • 31,095
  • 13
  • 107
  • 185
  • I looked up the source of the question and edited it into your question, along with the explanation. Which part of explanation do you need help with? – Nickolay Apr 19 '15 at 16:39

4 Answers4

1

Unique keys from the map output are assigned to only one reduce task. If there are 10 unique keys and there are 11, 50, or 101 reduce tasks then there will necessarily be some reduce tasks that have no keys.

Jeremy Beard
  • 2,727
  • 1
  • 20
  • 25
1

As there are 10 unique keys we need 10 reducers and since we want 1 reducer with no keys assigned, in total its 11 reducers.

If the number of reducers are more than or equal to 11, the job would run without exceptions. So any number more than or equal to 11 would be an answer.

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
1

Hash partitioner in this context merely means reduce tasks will be consolidated by unique key. It is assumed that a reduce task is completed on only one server, therefore each of 10 tasks are atomic.

The modulo operator (or any reasonable partitioner) will assure that each of 3 servers/reducers will be active for the case of 10 tasks.

For the other options, if there are more "reducers" than tasks, all tasks will be assigned to one "reducer" (only one remainder), if we are to believe the partitioning function. This is ridiculous or at least confusing without additional context. Apparently, partitioning is only required when the number of tasks exceeds the number of reducers/servers.

Russell
  • 11
  • 1
0

To get one reducer output as empty file, i.e., no key assigned to reducer, we need at least 11 reducers because hashpartitioner distributes based on hash function. The eligible reducers to receive data here are part-r-00000 to part-r-00009.

Reducer no = key hashcode % n ( no of reducers)

So possible remainders are 0 to n-1. Here we have 10 unique keys, i.e., 10 different remainders. We will have empty reducers files even if number of reducers less than unique number of keys. In the worst scenario also we will get one reducer file empty if the number of reducers is more than unique keys.

Shawn Mehan
  • 4,513
  • 9
  • 31
  • 51
Naga
  • 1,203
  • 11
  • 21