Why is MultipleOutputs not working for this Map Reduce program?

Question

I have a Mapper class that is giving a text key and IntWritable value which could be 1 two or three. Depending upon the values I have to write three different files with different keys. I am getting a Single File output with No record in it. Also, is there any good Multiple Outputs example(with explanation) you could guide me to?

My Driver Class Had this code:

    MultipleOutputs.addNamedOutput(job, "name", TextOutputFormat.class, Text.class,     IntWritable.class);
    MultipleOutputs.addNamedOutput(job, "attributes", TextOutputFormat.class, Text.class, IntWritable.class);
    MultipleOutputs.addNamedOutput(job, "others", TextOutputFormat.class, Text.class, IntWritable.class);

My reducer class is:

public static class Reduce extends Reducer<Text, IntWritable, Text, NullWritable> {

    private MultipleOutputs mos;
    public void setup(Context context) {
        mos = new MultipleOutputs(context);
    }
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        String CheckKey = values.toString();
        if("1".equals(CheckKey)) {
            mos.write("name", key, new IntWritable(1));
        }
        else if("2".equals(CheckKey)) {
            mos.write("attributes", key, new IntWritable(2));
        }
        else if("3".equals(CheckKey)) {
            mos.write("others", key,new IntWritable(3));
        }

        /* for (IntWritable val : values) {
            sum += val.get();
        }*/
        //context.write(key, null);
    }
    @Override
    public void cleanup(Context context) throws IOException, InterruptedException {
        mos.close();
    }
}

P.S I am new to HADOOP/MAP-Reduce Programming.

A first random question: are you completely sure that mos.write is called?, the values.toString() looks suspicious? — DDW, Oct 04 '13 at 10:03
I don't think so : Reduce input records=30 13/10/04 12:46:52 INFO mapred.JobClient: Reduce input groups=3 13/10/04 12:46:52 INFO mapred.JobClient: Combine output records=0 13/10/04 12:46:52 INFO mapred.JobClient: Physical memory (bytes) snapshot=230494208 13/10/04 12:46:52 INFO mapred.JobClient: Reduce output records=0 13/10/04 12:46:52 INFO mapred.JobClient: Virtual memory (bytes) snapshot=700944384 13/10/04 12:46:52 INFO mapred.JobClient: Map output records=30 — David, Oct 04 '13 at 10:05
This was the log i got. Output records is zero,though reducer gets 3 records.Also is there any good MultipleOutputs example you could giuide me to ? — David, Oct 04 '13 at 10:09
Can you trim the CheckKey and then check the equals conditions for 1,2,3 — Binary01, Oct 04 '13 at 10:12
Tried.Same Result. Reduce Input Groups has 3 records.But Combine Output Recods =0. — David, Oct 04 '13 at 10:20
Yes but nothing assures you that the input records have value 1,2,3! — DDW, Oct 04 '13 at 10:20
Previously i was writing pair in 1 Output File the Values being printed were fine. And I haven't changed Mapper. So,I'm assuming it must be working fine. — David, Oct 04 '13 at 10:22
If you print a collection to string it has brackets for example.. — DDW, Oct 04 '13 at 10:23
How can I extract values out of Iteratable ? I mean would it work the same way as collections ? — David, Oct 04 '13 at 10:30

score 2 · Answer 1 · answered Oct 04 '13 at 11:39

ArrayList<Integer> l = new ArrayList<Integer>();
l.add(1);
System.out.println(l.toString());

results in "[1]" not 1 so

values.toString()

will not give "1"

Apart from that I just tried to print an Iterable and it just gave a reference, so that is definitely your problem. If you want to iterate over the values do as in the example below:

Iterator<Text> valueIterator = values.iterator();
while (valueIterator.hasNext()){

}

Note that you can only iterate once!

Judge Mental · Answer 2 · 2013-10-04T13:25:32.850

Your problem statement is muddled. What do you mean, "depending on the values"? The reducer gets an Iterable of values, not a single value. Something tells me that you need to move the multiple output code in your reducer inside the loop you have commented out for taking the sum.

Or perhaps you don't need a reducer at all and can take care of this in the map phase. If you are using the reduce phase to end up with exactly 4 files by using a single reduce task, then you can also achieve what you want by flipping the key and value in your map phase and forgetting about MultipleOutputs altogether, because you'll end up with only 3 working reduce tasks, one for each of your int values. To get the 4th one you can output two copies of the record in each map call using a special key to indicate that the output is meant for the normal file, not one of the three special files. Normally I would not recommend such a course of action as you have severe bounds on the level of parallelism you can achieve in the reduce phase when the number of keys is small.

You should also include some anomalous data handling code to the end of your 'if' ladder that increments a counter or something if you encounter a value that is not one of the three you are expecting.

Why is MultipleOutputs not working for this Map Reduce program?

2 Answers2