0

I am using MRUnit to write unit tests for my mapreduce jobs.

However, I am having trouble including hdfs into that mix. My MR job needs a file from hdfs. How do I mock out the hdfs part in MRUnit test case?

Edit:

I know that I can specify inputs/exepctedOutput for my MR code in the test infrastructure. However, that is not what I want. My MR job needs to read another file that has domain data to do the job. This file is in HDFS. How do I mock out this file?

I tried using mockito but it didnt work. The reason was that FileSystem.open() returns a FSDataInputStream which inherits from other interfaces besides java.io.Stream. It was too painful to mock out all the interfaces. So, I hacked it in my code by doing the following

if (System.getProperty("junit_running") != null)
{
    inputStream = this.getClass().getClassLoader().getResourceAsStream("domain_data.txt");
    br = new BufferedReader(new InputStreamReader(inputStream));
} else {
    Path pathToRegionData = new Path("/domain_data.txt");

    LOG.info("checking for existence of region assignment file at path: " + pathToRegionData.toString());

    if (!fileSystem.exists(pathToRegionData))
    {
        LOG.error("domain file does not exist at path: " + pathToRegionData.toString());
        throw new IllegalArgumentException("region assignments file does not exist at path: " + pathToRegionData.toString());
    }

    inputStream = fileSystem.open(pathToRegionData);

    br = new BufferedReader(new InputStreamReader(inputStream));
}

This solution is not ideal because I had to put test specific code in my production code. I am still waiting to see if there is an elegant solution out there.

feroze
  • 7,380
  • 7
  • 40
  • 57

1 Answers1

0

Please follow the this small tutorial for MRUnit.

https://github.com/malli3131/HadoopTutorial/blob/master/MRUnit/Tutorial

In MRUnit test case, we supply the data inside the testMapper() and testReducer() methods. So there is no need of input from HDFS for MRUnit Job. Only MapReduce jobs require data inputs from HDFS.

Naga
  • 1,203
  • 11
  • 21
  • Thanks for your comment, but this is not what I wanted. I know that I can specify input/expectedoutput in the MRUnit infrastructure. My code reads another hdfs file, and I need to stub it out, so that , in the unit test context, i give it a file from the local filesystem. How do I do that ? I tried using mockito to mock the hdfsfilesystem but it doesnt work all the was as FSDataInputStream inherits from Seekable and another interface. I can of course mock this whole thing out, but i havent gone that far yet. – feroze Sep 02 '15 at 21:35