0

Can anybody help me explaining the execution flow of run() and reduce() method in a Reducer class. I am trying to calculate the average of word counts in my MapReduce job. My Reducer class receives "word" and "iterable of occurrences" as key-value pairs.

My objective is to calculate the average of word occurrences with respect to all the words in the document. Can run() method in reducer iterate through all the keys and count all the number of words? I can then use this sum to find the average by looping through each iterable value provided with the keys

    import java.io.IOException;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

    public class AverageReducer extends Reducer<Text, IntWritable, Text,IntWritable>  {

    private IntWritable average = new IntWritable();

    private static int count=0;

    protected void run()
     {
        //loop through all the keys and increment count
     }

   public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
     {
       int sum=0;
       for(IntWritable val:values)
         {
           sum=sum+val.get();
         }
       average.set(sum/count);
       context.write(key, average);
     }
Prithvi514
  • 143
  • 3
  • 13

1 Answers1

0

As described here you can't iterate over values twice. And i think it is bad idea to override run method, it just iterates through keys and calls reduce method for every pair (source). So you can't calculate the average of word occurrences with only one map-reduce job.

Community
  • 1
  • 1
Aleksei Shestakov
  • 2,508
  • 2
  • 13
  • 14