I'm learning hadoop. And for word count I have some code for mapper. Its same with one question which already exists on stack over flow but that's answer could not satisfy our doubts.
package com.company;
import org.apache.hadoop.io.IntWritable;import import org.apache.hadoop.io.LongWritable;org.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
for (String word : line.split(" ")) {
if (word.length() > 0) {
context.write(new Text(word), new IntWritable(1));
}
}
}
}
Here we can see map method of Mapper class gets overriding and Context context came into existence. When I have opened the hadoop jar files found that Context is an abstract class in Mapper class in hadoop library which is implementing MapContext interface like below
public abstract class Context implements MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
public Context() {
}
}
My doubts:
In my WordCountMapper code, Context object is storing the output key value pairs. How data gets stored into Context object? Is it a kind of list? Where I can its implementation in library?
Who is creating the Context class object? Since its a abstract class, which one is a concrete class for the same?
Is it hadoop framework itself who creates a Context object one we copy the data into HDFS? How it borns?
Any suggestion ?