MRUnit does not work with MultipleOutputs

Question

When I run a basic MRUnit with MultipleOutputs I get the following exception:

java.lang.NullPointerException
at org.apache.hadoop.fs.Path.<init>(Path.java:105)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getDefaultWorkFile(FileOutputFormat.java:264)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:125)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:405)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:387)
at com.skobbler.scratch.MOutputReduce.reduce(MOutputReduce.java:45)
at com.skobbler.scratch.MOutputReduce.reduce(MOutputReduce.java:28)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mrunit.mapreduce.ReduceDriver.run(ReduceDriver.java:265)
at org.apache.hadoop.mrunit.mapreduce.ReducePhaseRunner.runReduce(ReducePhaseRunner.java:85)
at org.apache.hadoop.mrunit.mapreduce.MapReduceDriver.run(MapReduceDriver.java:249)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)

I found that mapred.output.dir configuration is requested, which is null. This issue does not appear with simple output.

MRUnit code:

    @Test
public void testMultiOutput() throws IOException{
    MapReduceDriver<LongWritable, Text, Text, Text, Text, Text> mapReduceDriver = createMapReduceDrive();
    mapReduceDriver.withInput(new LongWritable(0L), new Text("a,b"));
    mapReduceDriver.withInput(new LongWritable(0L), new Text("a,c"));
    mapReduceDriver.withMultiOutput("foo", new Text("a"), new Text("2"));
    mapReduceDriver.runTest();
}

private MapReduceDriver<LongWritable, Text, Text, Text, Text, Text> createMapReduceDrive() {
    MOutputMap mapper = new MOutputMap();
    MOutputReduce reducer = new MOutputReduce();
    return MapReduceDriver.newMapReduceDriver(mapper, reducer);
}

How can I run the test without specifying a hadoop system/output path.

Hadoop 2, MRUnit 1.1.0

I'm having trouble reproducing your issue. Usually I see something like `Missing expected outputs for namedOutput ...` or `java.lang.IllegalArgumentException: Named output 'someOutput' not defined`, but not `NullPointerException`. Could you provide your more of your code or a small sample that could be tested? — Keegan, May 27 '15 at 19:32
I used a workaround this, mocked the emit of the reducer to use context.write() in the tests instead. Maybe it's a windows/hadoop version issue. — Horatiu Jeflea, May 29 '15 at 06:28

score 2 · Accepted Answer · answered Feb 26 '16 at 13:28

Yes, I run into this issue. But I find the solution from its source code.

TestDriver.java

Your can use getConfiguration() method to get the JobConfiguration Object, and then set the outputdir.

    Configuration conf = mapReduceDriver.getConfiguration();
    conf.set("mapreduce.output.fileoutputformat.outputdir", "aa");

score 1 · Answer 2 · answered Jun 24 '15 at 17:54

I ran into this same issue recently. I was using the @RunWith(SpringJUnit4ClassRunner.class) annotation before, but according to the comments in the JIRA issues for MRUnit at https://issues.apache.org/jira/browse/MRUNIT-13 and https://issues.apache.org/jira/browse/MRUNIT-213 we need to use @RunWith(PowerMockRunner.class) @PrepareForTest(MyMapper.class) or @PrepareForTest(MyReducer.class) to run tests that use MultipleOutputs.

I hope this helps someone else who runs into this issue.

MRUnit does not work with MultipleOutputs

2 Answers2