Can anybody help me explaining the execution flow of run() and reduce() method in a Reducer class. I am trying to calculate the average of word counts in my MapReduce job. My Reducer class receives "word" and "iterable of occurrences" as key-value pairs.
My objective is to calculate the average of word occurrences with respect to all the words in the document. Can run() method in reducer iterate through all the keys and count all the number of words? I can then use this sum to find the average by looping through each iterable value provided with the keys
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class AverageReducer extends Reducer<Text, IntWritable, Text,IntWritable> {
private IntWritable average = new IntWritable();
private static int count=0;
protected void run()
{
//loop through all the keys and increment count
}
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum=0;
for(IntWritable val:values)
{
sum=sum+val.get();
}
average.set(sum/count);
context.write(key, average);
}