0

I'm learning hadoop. And for word count I have some code for mapper. Its same with one question which already exists on stack over flow but that's answer could not satisfy our doubts.

package com.company;

import org.apache.hadoop.io.IntWritable;import import org.apache.hadoop.io.LongWritable;org.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();
        for (String word : line.split(" ")) {

            if (word.length() > 0) {
                context.write(new Text(word), new IntWritable(1));
            }

        }

    }

}

Here we can see map method of Mapper class gets overriding and Context context came into existence. When I have opened the hadoop jar files found that Context is an abstract class in Mapper class in hadoop library which is implementing MapContext interface like below

public abstract class Context implements MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        public Context() {
        }
    }

My doubts:

  1. In my WordCountMapper code, Context object is storing the output key value pairs. How data gets stored into Context object? Is it a kind of list? Where I can its implementation in library?

  2. Who is creating the Context class object? Since its a abstract class, which one is a concrete class for the same?

  3. Is it hadoop framework itself who creates a Context object one we copy the data into HDFS? How it borns?

Any suggestion ?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
JonyLinux
  • 41
  • 8
  • @cricket_007, But in those answers not having facts that how its stores in it ? And secondly who triggers them to be born ....? Are those APIs ? – JonyLinux Oct 21 '17 at 18:12
  • It writes Writables. It stores metadata about a Job. It stores the Hadoop configuration object. You can define and store your own counters in it within a map... Read the JavaDoc... It's all an API – OneCricketeer Oct 21 '17 at 18:21
  • @cricket_007, Writables are all about byte streams. So you mean context object writing outputs directly into byte streams ? – JonyLinux Oct 21 '17 at 18:29
  • No, not directly. Look at the code... `context.write(new Text(word)` – OneCricketeer Oct 21 '17 at 20:03
  • But there is a `BytesWritable` class – OneCricketeer Oct 21 '17 at 20:05
  • @cricket_007, Look at this context.write(new Text(word),new IntWritable(1));, Its a Mapper output which is context object is writing somewhere. Could you please tell me where I can find this write method so that I could see where its writing actually ? – JonyLinux Oct 22 '17 at 09:38
  • 1
    I'm not actually sure where it writes. Nor, should you care until you think it is a problem. It buffers in memory somewhere until the task finishes, then is shuffled and sorted prior to a Reduce task – OneCricketeer Oct 22 '17 at 14:48
  • @cricket_007, fair enough...!! – JonyLinux Oct 22 '17 at 15:29

0 Answers0