1

I am looking to see if there is a way to run hadoop Mapreduce unit tests in Windows without having a hadoop setup. Does MRUnit run on windows (without cygwin) in Eclipse as a Java maven project?

Thanks Srivatsan Nallazhagappan

1 Answers1

1

You can run standalone MRUnit tests. All you need are a few dependencies in your pom. I just did a quick little test and these are the only dependencies I needed to run a successful test. Just a simple test with hard coded values. No cywin, no hadoop setup, just the dependencies.

<dependencies>
    <dependency>
        <groupId>jdk.tools</groupId>
        <artifactId>jdk.tools</artifactId>
        <version>1.7.0_25</version>
        <scope>system</scope>
        <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-yarn-common</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.mrunit</groupId>
        <artifactId>mrunit</artifactId>
        <version>1.0.0</version>
        <classifier>hadoop2</classifier>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-core</artifactId>
        <version>2.2.0</version>
    </dependency>
</dependencies>

As far as running a setup without cygwin, that's possible also. Have a look at this site and this site for help with building and installing hadoop for windows without cywin.

Also another helpful tool is the hadoop plugin for eclipse. You can see a compiled version here. It's pretty easy to use. You can get some help on how to use it here

Community
  • 1
  • 1
Paul Samsotha
  • 205,037
  • 37
  • 486
  • 720
  • Thanks.It works great. This is cool, I see Mrunit also supports distributed cache. Just an extension of my original question (will create a separate post if required), I assume mrunit cannot mock if there are hdfs read/write calls within the map/set up method(I understand its not generally recommended, unfortunately I have read calls in my setup method). – Srivatsan Nallazhagappan May 19 '14 at 04:31
  • You can use a `MiniMRCluster` and `MiniDFSCluster` as discussed [here](http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/). You'll need the `hadoop-test-x.y.z.jar` for this. I've just set it up in eclipse. Let me know if you want further information – Paul Samsotha Jun 17 '14 at 06:46
  • Thanks. Will try them out. Found that Mrunit does not seem to have a complete support of distributed cache (symlink feature for example is not available) atleast in the 1.0.0 version I am using. – Srivatsan Nallazhagappan Jun 19 '14 at 10:39