8

I've written a mapreduce program in Java, which I can submit to a remote cluster running in distributed mode. Currently, I submit the job using the following steps:

  1. export the mapreuce job as a jar (e.g. myMRjob.jar)
  2. submit the job to the remote cluster using the following shell command: hadoop jar myMRjob.jar

I would like to submit the job directly from Eclipse when I try to run the program. How can I do this?

I am currently using CDH3, and an abridged version of my conf is:

conf.set("hbase.zookeeper.quorum", getZookeeperServers());
conf.set("fs.default.name","hdfs://namenode/");
conf.set("mapred.job.tracker", "jobtracker:jtPort");
Job job = new Job(conf, "COUNT ROWS");
job.setJarByClass(CountRows.class);

// Set up Mapper
TableMapReduceUtil.initTableMapperJob(inputTable, scan, 
    CountRows.MyMapper.class, ImmutableBytesWritable.class,  
    ImmutableBytesWritable.class, job);  

// Set up Reducer
job.setReducerClass(CountRows.MyReducer.class);
job.setNumReduceTasks(16);

// Setup Overall Output
job.setOutputFormatClass(MultiTableOutputFormat.class);

job.submit();

When I run this directly from Eclipse, the job is launched but Hadoop cannot find the mappers/reducers. I get the following errors:

12/06/27 23:23:29 INFO mapred.JobClient:  map 0% reduce 0%  
12/06/27 23:23:37 INFO mapred.JobClient: Task Id :   attempt_201206152147_0645_m_000000_0, Status : FAILED  
java.lang.RuntimeException: java.lang.ClassNotFoundException:   com.mypkg.mapreduce.CountRows$MyMapper  
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)  
    at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)  
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)  
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)   
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at javax.security.auth.Subject.doAs(Subject.java:396)  
    at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)  
    at org.apache.hadoop.mapred.Child.main(Child.java:264)  
...

Does anyone know how to get past these errors? If I can fix this, I can integrate more MR jobs into my scripts which would be awesome!

Alexey Grigorev
  • 2,415
  • 28
  • 47
Tucker
  • 7,017
  • 9
  • 37
  • 55
  • Tucker - I was able to run the Hadoop job in stand alone, but not other modes from Eclipse. I posted the query in the Hadoop forums some time back and there was no +ve response. BTW, Hadoop runs in the stand alone mode without any configuration files (default parameters). – Praveen Sripati Jun 28 '12 at 01:25
  • When you submit the job from within Eclipse, are the mapper / reducer classes in the same project, or is the jar containing them on the classpath, and the classes themselves nowhere else on the cp? – Chris White Jun 28 '12 at 01:41
  • @ChrisWhite The class containing everything is called CountRows. This class contains a 'main' method which sets the job configurations. the CountRows class also contains the class for the mapper and reducer called MyMapper and MyReducer respectively. The job works fine as I said when I launch the job from the comandline by writing 'hadoop jar CountRows.jar' – Tucker Jun 28 '12 at 14:26
  • That's not my question, when you submit the job in Eclipse is the CountRows.jar on the classpath or are you submitting the job from within the CountRows project (hence the class files are not bundled into a jar) – Chris White Jun 28 '12 at 15:30
  • @ChrisWhite I'm submitting the job from within the Count Rows project. – Tucker Jun 28 '12 at 16:01

3 Answers3

8

If you're submitting the hadoop job from within the Eclipse project that defines the classes for the job then you most probably have a classpath problem.

The job.setjarByClass(CountRows.class) call is finding the class file on the build classpath, and not in the CountRows.jar (which may or may not have been built yet, or even on the classpath).

You should be able to assert this is true by printing out the result of job.getJar() after you call job.setjarByClass(..), and if it doesn't display a jar filepath, then it's found the build class, rather than the jar'd class

Chris White
  • 29,949
  • 4
  • 71
  • 93
  • Ok, I ran "job.setJarByClass(CountRows.class); System.out.println("getClass... : "+job.getClass());" and the result was simply "getClass... : class org.apache.hadoop.mapreduce.Job" – Tucker Jun 28 '12 at 16:26
  • Well that's unsurprising considering you ask for the class type of Job. Try `job.getJar()` rather than `job.getClass()` – Chris White Jun 28 '12 at 16:34
  • Ha, your right- wrong method! I ran it now, it says its null. Is there a way to get it to work when I run the program through eclipse? – Tucker Jun 28 '12 at 17:43
  • If you've built the jar, you can hard code the jar location using mapred.jar to the local path of the jar file – Chris White Jun 28 '12 at 17:54
  • 2
    Ok, that worked! I added ` conf.set("mapred.jar","/path/to/my/jar/CountRows.jar");` So there is no way to run the map reduce job in eclipse without having to export the MR class from Eclipse into its own jar and specifying it in the job configuration as above? – Tucker Jun 28 '12 at 18:11
  • btw, a simple yes or no and I can mark this question as answered :-) – Tucker Jun 28 '12 at 19:00
  • 1
    Not that i know of, unless the jar is already in HDFS, you have to get the classes to the jobtracker some way, and in a single Jar is the current method, so no. – Chris White Jun 28 '12 at 19:21
2

What worked for me was exporting a runnable JAR (the difference between it and a JAR is that the first defines the class, which has the main method) and selecting the "packaging required libraries into JAR" option (choosing the "extracting..." option leads to duplicate errors and it also has to extract the class files from the jars, which, ultimately, in my case, resulted in not resolving the class not found exception).

After that, you can just set the jar, as was suggested by Chris White. For Windows it would look like this: job.setJar("C:\\\MyJar.jar");

If it helps anybody, I made a presentation on what I learned from creating a MapReduce project and running it in Hadoop 2.2.0 in Windows 7 (in Eclipse Luna)

Peeter Kokk
  • 1,625
  • 2
  • 15
  • 9
1

I have used this method from the following website to configure a Map/Reduce project of mine to run the project using Eclipse (w/o exporting project as JAR) Configuring Eclipse to run Hadoop Map/Reduce project

Note: If you decide to debug you program, your Mapper class and Reducer class won't be debug-able.

Hope it helps. :)

  • 1
    You should provide a summary of the solution not just provide a link that could go away – mmmmmm Sep 18 '12 at 15:52
  • 2
    You said "If you decide to debug you program, your Mapper class and Reducer class won't be debug-able." why is this and is it always true? – John Sep 11 '13 at 14:32