2

With Apache MRUnit I'm able to unit test my MapReduce program locally before running it on the cluster.

My program needs to read from DistributedCache so I wrap DistributedCache.getLocalCacheFiles in a class which I mock in my unit test. I setup a stub so that when the method will not be called but a local path will be returned instead. But it turns out that the method is called and throws FileNotFoundException.

Here's how my MapReduce program looks like

public class TopicByTime implements Tool {
  private static Map<String, String> topicList = null;

  public static void main(String[] args) throws Exception {
   System.exit(ToolRunner.run(new TopicByTime(), args));    
  }

  @Override
  public int run(String[] args) throws Exception {
    Job job = new Job();
    /* Job setup */
    DistributedCache.addCacheFile(new URI(/* path on hdfs */), conf);
    job.waitForCompletion(true);
    return 0;
  }

 protected static class TimeMapper extends Mapper<LongWritable, Text, Text, Text> {    
   @Override
   public void setup(Context context) throws IOException, InterruptedException {
     DistributedCacheClass cache = new DistributedCacheClass();
     Path[] localPaths = cache.getLocalCacheFiles(context.getConfiguration());
     if (null == localPaths || 0 == localPaths.length) {
       throw new FileNotFoundException("Distributed cached file not found");
     }
     topicList = Utils.loadTopics(localPaths[0].toString());

   }

   @Override
   public void map(LongWritable key, Text value, Context context) 
      throws IOException, InterruptedException {
      /* do map */
   }
  }

  /* Reducer and overriding methods */
} 

And my test program

public class TestTopicByTime {

  @Before
  public void setUp() throws IOException {
    Path[] localPaths = { new Path("resource/test/topic_by_time.txt")};
    Configuration conf = Mockito.mock(Configuration.class);
    DistributedCacheClass cache = Mockito.mock(DistributedCacheClass.class);
    when(cache.getLocalCacheFiles(conf)).thenReturn(localPaths); 
   }

  @Test
  public void testMapper() {
  }

  @Test
  public void testReducer() {
  }

  @Test
  public void testMapReduce() {
  }
}

DistributedCacheClass is a simple wrapper

public class DistributedCacheClass {
  public Path[] getLocalCacheFiles(Configuration conf) throws IOException {
    return DistributedCache.getLocalCacheFiles(conf);
  }
}

I could have added a flag in Mapper's setup method such that local path is read when testing but I do want to split test codes from my MapReduce program.

I'm new to Mock Test and MRUnit so there could be newbie bugs in my program. Please point the bugs out and I'll fix them and post my updates below.

manuzhang
  • 2,995
  • 4
  • 36
  • 67

1 Answers1

0

In your TestTopicByTime.setUp() change the line

when(cache.getLocalCacheFiles(conf)).thenReturn(localPaths);

to

when(cache.getLocalCacheFiles(any(Configuration.class))).thenReturn(localPaths);

While mocking, arguments are matched to be equal. They are not equal in your case because you are passing actual context in your code which does not match the mock configuration you created in test. So you need to use the Argument Matcher any() to avoid exact object comparison.

Edit: Also, in your TimeMapper.setup() you are creating a new cache, DistributedCacheClass cache = new DistributedCacheClass();

so you are not at all using the mock object you created. You must be able to inject the your mock cache into TimeMapper instead. You can pass the cache object from outside to TimeMapper, ideally via constructor. So in your test you can pass it a mock cache object.

Gopi
  • 10,073
  • 4
  • 31
  • 45
  • throws the same `FileNotFoundException` – manuzhang Mar 28 '13 at 05:06
  • That will mingle test codes with my mapreduce program. Then why use mocking in the first place? I could pass in a flag via constructor and read from local path in `setup` if the flag is true. – manuzhang Mar 28 '13 at 16:01
  • Not sure what you are saying, but your test code will never go in your program. Ofcourse, writing tests does affect design (in good way) but never interferes. Writing unit tests for a given class requires all of its dependencies to be mocked out, and to make it possible you must be able to inject the dependencies from outside. I would suggest you should first go through some quick tutorial on unit testing or TDD, that will help you design your classes so that they are testable. – Gopi Mar 28 '13 at 16:50
  • thanks for your advice. That's what I'm about to do. My current unit test is MRUnit without mocking. I have a `test` flag (false by default) and set it to true through Mapper's constructor when testing. Then in `setUp` localpath will be read from. Otherwise, when the program is running on the cluster `test` is false and file is read from `DistributedCache`. Is this ok for you? I just want to get rid of that flag and reading-from-localpath codes which are only for testing. – manuzhang Mar 28 '13 at 19:04
  • With a testable class design, you should be able to do away with such flags and any test related code to get into your program. – Gopi Mar 29 '13 at 03:51
  • would you please explain a bit about `testable class design`? Any reference is appreciated – manuzhang Mar 29 '13 at 04:54
  • 1
    Well I don't find one single place that has it all, most people say it is hard to explain, but easy when experienced. I hope below links should help you - http://www.softwaretestingmagazine.com/knowledge/guidelines-for-java-testable-design/ http://www.objectmentor.com/resources/articles/TestableJava.pdf http://stackoverflow.com/questions/1468547/designing-constructors-for-testability http://debasishg.blogspot.in/2007/03/making-classes-unit-testable.html – Gopi Mar 29 '13 at 05:34
  • nice articles. I see what you mean now. Constructor injection is what I need and I could create a FakeDistributedCacheClass in my UT. Thanks – manuzhang Mar 30 '13 at 01:17