I'm trying to use the MAP_OUTPUT_RECORDS counter in the reducer class to calculate the percentage of words in the sample wordcount program.
Here is the code for the setup()
method in the reducer:
public static class IntSumReducer extends
Reducer<Text,FloatWritable,Text,FloatWritable> {
private FloatWritable result = new FloatWritable();
private long total = 0;
@Override
public void setup(Context context) throws IOException , InterruptedException{
total = context.getCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_OUTPUT_RECORDS").getValue();
System.out.println("total : " + total);
}
This is the output of the print statement in the last line:
total : 1131
total : 487
total : 421
total : 333
total : 101
total : 101
total : l95
total : l85
total : 0
I don't understand:
- Why the
setup()
method is getting called multiple times? According to the definition, it should get called only once at the start of the task. - Why does the value of 'MAP_OUTPUT_RECORDS' keeps on changing? Shouldn't it be one unique value? (The total output of all the mappers combined)?
I dont think the reducers start before all the mappers have finished executing . Why isn't the 'MAP_OUTPUT_RECORDS' value a constant?