1

I'm trying to use the MAP_OUTPUT_RECORDS counter in the reducer class to calculate the percentage of words in the sample wordcount program.

Here is the code for the setup() method in the reducer:

public static class IntSumReducer extends  
    Reducer<Text,FloatWritable,Text,FloatWritable> {  
    private FloatWritable result = new FloatWritable();  
    private long total = 0;  

    @Override
    public void setup(Context context) throws IOException , InterruptedException{  
        total = context.getCounter("org.apache.hadoop.mapred.Task$Counter",  "MAP_OUTPUT_RECORDS").getValue();
        System.out.println("total : " + total);
    }

This is the output of the print statement in the last line:

total : 1131
total : 487
total : 421
total : 333
total : 101
total : 101
total : l95
total : l85
total : 0

I don't understand:

  1. Why the setup() method is getting called multiple times? According to the definition, it should get called only once at the start of the task.
  2. Why does the value of 'MAP_OUTPUT_RECORDS' keeps on changing? Shouldn't it be one unique value? (The total output of all the mappers combined)?

I dont think the reducers start before all the mappers have finished executing . Why isn't the 'MAP_OUTPUT_RECORDS' value a constant?

Sumit Das
  • 1,007
  • 9
  • 17

1 Answers1

0

"Any reduce function call should be after all the mappers have done their work".

is strictly true only if speculative execution is explicitly turned off. Otherwise there is a chance that some reduce tasks can actually start before all the maps are complete.

For that, please check the link,

http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3C5f6b7e1002011502u774934f9x5800590f264a933a@mail.gmail.com%3E

1) Regarding setup() method calling, it might happen that multiple reducers are launched for your job and for each reducer, setup method is called. Please check,

setup and cleanup methods of Mapper/Reducer in Hadoop MapReduce

2) MAP_OUTPUT_RECORDS keeps changing because as per Hadoop-Definative guide book,

MAP_OUTPUT_RECORDS: "The number of map output records produced by all the maps in the job. Incremented every time the collect() method is called on a map’s OutputCollector".

It might happen that mappers are running and at the same time reducer's setup() method is called, hence the MAP_OUTPUT_RECORDS is changing for each call.

I hope this answer helps.

Community
  • 1
  • 1
prashant khunt
  • 154
  • 3
  • 8