4

"a reducer is different than a reduce task. A reducer can run multiple reduce tasks". Can someone explain this with the below example?

foo.txt: Sweet, this is the foo file bar.txt: This is the bar file

and I am using 2 reducers. What are the reduce tasks and based on what multiple reduce tasks are generated in a reducer?

Community
  • 1
  • 1
Arighna
  • 89
  • 2
  • 9

3 Answers3

5

Reducer is a class, which contain reduce function as below

protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {

Reduce task is program running on a node, which is executing reduce function of Reducer class.

You can think Reduce task as an instance of Reducer

Have a look at Apache MapReduce tutorial page for more details ( Payload section).

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
2

From my understanding, Reducer is a slot of computational resource, that can be used to accomplish reduce tasks. A reducer can be assigned to a task, which it performs to completion/failure and as soon as the task reaches an end-state, it is available for processing another reduce task, post-cleanup.

In Yarn, the concepts are a bit different though.

rahulbmv
  • 704
  • 3
  • 16
2

The reducer is the code you are writing (or reusing) to process the data coming in.

The reduce task is the actual instantiation of a reducer code that runs on a node in your cluster. This task has a state machine and might fail. In case of failure another reduce task is spun up to restart the computation. This is called reduce task attempt. There is a finite number of retries to restart the computation ("maximum amount of attempts").

You can configure n reducers (as in reduce tasks), which is the maximum amount of parallel reduce tasks that might happen at any point in time of the job execution (set aside speculative execution).

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
  • Quoting Retry mechanism in answer is useful – Ravindra babu Mar 08 '16 at 13:54
  • You mean to say that each reducer is meant to execute one reduce task if it does not fail, right!! For the example I took, 1st reduce task will produce: bar 1 file 2 foo 1 is 2 And the second reduce task will produce: sweet 1 the 2 this 2 Please confirm if I am correct – Arighna Mar 08 '16 at 23:18