Why hadoop does not recognize my Map class?

Question

I am trying to run my PDFWordCount map-reduce program on hadoop 2.2.0 but I get this error:

13/12/25 23:37:26 INFO mapreduce.Job: Task Id : attempt_1388041362368_0003_m_000009_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
    ... 8 more

It says that my map class is not known. I have a cluster with a namenod and 2 datanodes on 3 VMs.

My main function is this:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(MyMap.class);
    job.setReducerClass(MyReduce.class);

    job.setInputFormatClass(PDFInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setJarByClass(PDFWordCount.class);
    job.waitForCompletion(true);
  }

If I run my jar using this command:

yarn jar myjar.jar PDFWordCount /in /out

it takes /in as output path and gives me error while I have job.setJarByClass(PDFWordCount.class); in my main function as you see above.

I have run simple WordCount project with main function exactly like this main function and to run it, I used yarn jar wc.jar MyWordCount /in2 /out2 and it run flawlessly.

I can't understand what is the problem!

UPDATE: I tried to move my work from this project to wordcount project I have used successfully. I built a package, copied related files from pdfwordcount project to this package and exported the project (my main was not changed to used PDFInputFormat, so I did nothing except moving java files to new package.) It didn't work. I deleted files from other project but it didn't work. I moved java file back to default package but it didn't work!

What's wrong?!

Did your jar file have the class file of MyMap (MyMap.class)?It's should be in your jar file as PDFWordCount$MyMap.class.Try to manually check . I am not sure but maven should solve the problem . — saurabh shashank, Dec 26 '13 at 11:15
my map and reduce classes are in the same file with main method. — Mehraban, Dec 26 '13 at 11:17
can you open the jar "myjar.jar" and check if you can find PDFWordCount$MyMap.class , PDFWordCount$MyReduce.class & PDFWordCount.class — saurabh shashank, Dec 26 '13 at 11:18
What is visibility of the definition of the inner PDFWordCount.MyMap class? It should be public static. Also what is the value of your `hadoop classpath` or $HADOOP_CLASSPATH? — jtravaglini, Dec 30 '13 at 17:48
I have the same issue. But i am trying to run this through Spring batch jobLauncher code. Picked up the simple wordCount example from [link]http://docs.spring.io/spring-hadoop/docs/1.0.1.RC1/reference/html/batch-wordcount.html. Should i post a new question. What all artefacts/information is required for analysis. — GSG, Jul 02 '14 at 20:27
I did not try your answer as my requirement is slightly different. I am trying to invoke this job via a JobLauncher. The launcher code i am using is [link]https://github.com/spring-projects/spring-data-book/blob/master/hadoop/batch-wordcount/src/main/java/com/oreilly/springdata/hadoop/wordcount/WordCount.java Can't i run this example simply through a java command. Actually my application will ultimately be exposing this workflow running through a web-service. Let me know your thoughts. Thanks. — GSG, Jul 03 '14 at 06:38
@GSG I think you should ask your problem as a new question in SO, since to answer it, one needs to get some information about your config, code and error you got. — Mehraban, Jul 03 '14 at 07:24

score 4 · Accepted Answer · answered Dec 26 '13 at 12:03

I found a way to overcome this problem, even though I couldn't understand what was the problem actually.

When I want to export my java project as a jar file in eclipse, I have two options:

Extract required libraries into generated JAR
Package required libraries into generated JAR

I don't know exactly what is the difference or is it a big deal or not. I used to choose second option, but if I choose first option, I can run my job using this command:

yarn jar pdf.jar /in /out

Why hadoop does not recognize my Map class?

1 Answers1

Linked