Hadoop MapReduce MultipleOutputs - one in Mapper, one in Reducer

Question

I want to use multiple outputs in a Hadoop job in Elastic MapReduce. So, I set up MultipleOutputs in the main() method like so:

MultipleOutputs.addNamedOutput(hadoopJob, "One",
    TextOutputFormat.class, NullWritable.class, Text.class);

MultipleOutputs.addNamedOutput(hadoopJob, "Two",
    TextOutputFormat.class, NullWritable.class, Text.class);

I want "one" to contain output from the Mapper, while "two" contains output from the Reducer.

In the setup method for both the mapper and reducer, I call:

outputWriters = new MultipleOutputs(context);

In the mapper, I call:

outputWriters.write("One", nothing, sampleOutput, "One");

In the reducer, I call:

outputWriters.write("Two", nothing, new Text(thing.getStuff()), "Two");

Finally, in the cleanup method for both the mapper and reducer, I call:

outputWriters.close();

When I do this, I get a "file already exists" exception from the Reducer - it tries to recreate the output files that were already created by the mapper.

I can solve this by removing outputWriters.close() from the mapper cleanup method, but it introduces another problem: I don't get any of the mapper output.

What's the proper way to use MultipleOutputs with one in the mapper and one in the reducer? The JavaDocs do not mention this situation, and I haven't found anything useful on StackOverflow.

Update: This appears to run fine locally. However, if I try to run it in Elastic MapReduce with S3 output, I run into the "file already exists error." Any ideas on workarounds?

The code should work, can you please share the mapper and reducer code? — Nishu Tayal, Oct 11 '16 at 15:19
Please see my update: apparently this runs fine locally, but not in Elastic MapReduce. Also, I'd love to share the mapper and reducer code, but they are highly proprietary. — John Chrysostom, Oct 11 '16 at 15:23
When you run it locally do the multiple outputs from the mapper and reducer end up in the same directory? — Binary Nerd, Oct 11 '16 at 16:08
Yes, they do... which is what I would expect since I'm basically just supplying "one" and "two" as file names. — John Chrysostom, Oct 11 '16 at 17:15

Hadoop MapReduce MultipleOutputs - one in Mapper, one in Reducer

0 Answers0