With Apache MRUnit I'm able to unit test my MapReduce program locally before running it on the cluster.
My program needs to read from DistributedCache so I wrap DistributedCache.getLocalCacheFiles
in a class which I mock in my unit test. I setup a stub so that when the method will not be called but a local path will be returned instead. But it turns out that the method is called and throws FileNotFoundException
.
Here's how my MapReduce program looks like
public class TopicByTime implements Tool {
private static Map<String, String> topicList = null;
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new TopicByTime(), args));
}
@Override
public int run(String[] args) throws Exception {
Job job = new Job();
/* Job setup */
DistributedCache.addCacheFile(new URI(/* path on hdfs */), conf);
job.waitForCompletion(true);
return 0;
}
protected static class TimeMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
public void setup(Context context) throws IOException, InterruptedException {
DistributedCacheClass cache = new DistributedCacheClass();
Path[] localPaths = cache.getLocalCacheFiles(context.getConfiguration());
if (null == localPaths || 0 == localPaths.length) {
throw new FileNotFoundException("Distributed cached file not found");
}
topicList = Utils.loadTopics(localPaths[0].toString());
}
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
/* do map */
}
}
/* Reducer and overriding methods */
}
And my test program
public class TestTopicByTime {
@Before
public void setUp() throws IOException {
Path[] localPaths = { new Path("resource/test/topic_by_time.txt")};
Configuration conf = Mockito.mock(Configuration.class);
DistributedCacheClass cache = Mockito.mock(DistributedCacheClass.class);
when(cache.getLocalCacheFiles(conf)).thenReturn(localPaths);
}
@Test
public void testMapper() {
}
@Test
public void testReducer() {
}
@Test
public void testMapReduce() {
}
}
DistributedCacheClass
is a simple wrapper
public class DistributedCacheClass {
public Path[] getLocalCacheFiles(Configuration conf) throws IOException {
return DistributedCache.getLocalCacheFiles(conf);
}
}
I could have added a flag in Mapper's setup method such that local path is read when testing but I do want to split test codes from my MapReduce program.
I'm new to Mock Test and MRUnit so there could be newbie bugs in my program. Please point the bugs out and I'll fix them and post my updates below.