-1

I want to perform few operations on a single text file.

For eg: Task 1: Count all the words

Task 2: Count words ending with specific characters

Task 3: Count words occuring multiple times.

What is the best way of achieving this?

Do I need to write multiple mappers and multiple reducers? Multiple Mapper and Single Reducer? Or if we can do it with single mapper and reducer

It would be great if some one could provide with an programming example.

TjS
  • 277
  • 2
  • 5
  • 16
  • The only way to control the number of mappers is your data size. I believe you mean total applications? Also, word count is the exact sample code given on hadoop site, so where are you stuck? Stackoverflow is not a tutorial or code handout service, unfortunately – OneCricketeer Mar 16 '18 at 03:09

1 Answers1

1

Use counters to count what you are looking for. After MapReduce completes, just fetch the counters in the driver class.

e.g. Number of words and words starting with "z" or "Z" can be counted in the mapper

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
            if(hasKey.toUpperCase().startsWith("Z")){
            context.getCounter("my_counters", "Z_WORDS").increment(1);
            }
            context.write(word, one);
        }
    }
}

The Number of distinct words and words appearing less than 4 times can be counted in reducer counter.

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int wordCount= 0;
        context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
        for (IntWritable val : values){
            wordCount += val.get();
        }
        if(wordCount < 4{
           context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
        }
    }
}

In the Driver class fetch the counters. The below code goes after the line where you have submitted the job

CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
   System.out.println(counter.getName() + "=" + counter.getValue());
}
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53