Use counters to count what you are looking for. After MapReduce completes, just fetch the counters in the driver class.
e.g. Number of words and words starting with "z" or "Z" can be counted in the mapper
public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String hasKey = itr.nextToken();
word.set(hasKey);
context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
if(hasKey.toUpperCase().startsWith("Z")){
context.getCounter("my_counters", "Z_WORDS").increment(1);
}
context.write(word, one);
}
}
}
The Number of distinct words and words appearing less than 4 times
can be counted in reducer counter.
public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int wordCount= 0;
context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
for (IntWritable val : values){
wordCount += val.get();
}
if(wordCount < 4{
context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
}
}
}
In the Driver class fetch the counters. The below code goes after the line where you have submitted the job
CounterGroup group = job.getCounters().getGroup("my_counters");
for (Counter counter : group) {
System.out.println(counter.getName() + "=" + counter.getValue());
}