3

I am trying to run my PDFWordCount map-reduce program on hadoop 2.2.0 but I get this error:

13/12/25 23:37:26 INFO mapreduce.Job: Task Id : attempt_1388041362368_0003_m_000009_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
    ... 8 more

It says that my map class is not known. I have a cluster with a namenod and 2 datanodes on 3 VMs.

My main function is this:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(MyMap.class);
    job.setReducerClass(MyReduce.class);

    job.setInputFormatClass(PDFInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setJarByClass(PDFWordCount.class);
    job.waitForCompletion(true);
  }

If I run my jar using this command:

yarn jar myjar.jar PDFWordCount /in /out

it takes /in as output path and gives me error while I have job.setJarByClass(PDFWordCount.class); in my main function as you see above.

I have run simple WordCount project with main function exactly like this main function and to run it, I used yarn jar wc.jar MyWordCount /in2 /out2 and it run flawlessly.

I can't understand what is the problem!

UPDATE: I tried to move my work from this project to wordcount project I have used successfully. I built a package, copied related files from pdfwordcount project to this package and exported the project (my main was not changed to used PDFInputFormat, so I did nothing except moving java files to new package.) It didn't work. I deleted files from other project but it didn't work. I moved java file back to default package but it didn't work!

What's wrong?!

Mehraban
  • 3,164
  • 4
  • 37
  • 60
  • Did your jar file have the class file of MyMap (MyMap.class)?It's should be in your jar file as PDFWordCount$MyMap.class.Try to manually check . I am not sure but maven should solve the problem . – saurabh shashank Dec 26 '13 at 11:15
  • my map and reduce classes are in the same file with main method. – Mehraban Dec 26 '13 at 11:17
  • can you open the jar "myjar.jar" and check if you can find PDFWordCount$MyMap.class , PDFWordCount$MyReduce.class & PDFWordCount.class – saurabh shashank Dec 26 '13 at 11:18
  • What is visibility of the definition of the inner PDFWordCount.MyMap class? It should be public static. Also what is the value of your `hadoop classpath` or $HADOOP_CLASSPATH? – jtravaglini Dec 30 '13 at 17:48
  • @jtravaglini Where should I look for it? – Mehraban Jan 01 '14 at 06:41
  • I have the same issue. But i am trying to run this through Spring batch jobLauncher code. Picked up the simple wordCount example from [link]http://docs.spring.io/spring-hadoop/docs/1.0.1.RC1/reference/html/batch-wordcount.html. Should i post a new question. What all artefacts/information is required for analysis. – GSG Jul 02 '14 at 20:27
  • @GSG Did you try my answer? – Mehraban Jul 02 '14 at 20:47
  • I did not try your answer as my requirement is slightly different. I am trying to invoke this job via a JobLauncher. The launcher code i am using is [link]https://github.com/spring-projects/spring-data-book/blob/master/hadoop/batch-wordcount/src/main/java/com/oreilly/springdata/hadoop/wordcount/WordCount.java Can't i run this example simply through a java command. Actually my application will ultimately be exposing this workflow running through a web-service. Let me know your thoughts. Thanks. – GSG Jul 03 '14 at 06:38
  • @GSG I think you should ask your problem as a new question in SO, since to answer it, one needs to get some information about your config, code and error you got. – Mehraban Jul 03 '14 at 07:24
  • ok. Thanks. Shall do that. – GSG Jul 03 '14 at 09:29

1 Answers1

4

I found a way to overcome this problem, even though I couldn't understand what was the problem actually.

When I want to export my java project as a jar file in eclipse, I have two options:

  1. Extract required libraries into generated JAR
  2. Package required libraries into generated JAR

I don't know exactly what is the difference or is it a big deal or not. I used to choose second option, but if I choose first option, I can run my job using this command:

yarn jar pdf.jar /in /out
Mehraban
  • 3,164
  • 4
  • 37
  • 60