0

I am using the Jsoup.jar to get the keywords from the meta tag of several websites using MapReduce. The list of websites is being kept within a txt file. However when I compile the java file in terminal, it says that package org.jsoup.Jsoup does not exist. I made sure that the jar is in the same folder as that of the java file.

Screenshot of error:

enter image description here

Tashley
  • 1
  • 4
  • You need to add the library to the build path. – nbokmans Dec 20 '16 at 16:04
  • I wrote the code using Eclipse in Windows. Then I sent the java file to my Linux OS where I'll put into practice my code. That's where I'm having those errors. @nbokmans – Tashley Dec 20 '16 at 16:12
  • You have to make sure that the required jar libraries are in the classpath, have a look at this question : http://stackoverflow.com/questions/26748811/setting-external-jars-to-hadoop-classpath – Arnaud Dec 20 '16 at 16:16
  • @Berger : I added the jar library in the **hadoop/lib** folder. It compiled successfully and hence I was able to generate the jar later. However when I feed my MapReduce algorithm the input file, I get the following error: 16/12/20 02:33:31 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. attempt_201610201307_0008_m_000000_0, Status : FAILED Error: java.lang.ClassNotFoundException: org.jsoup.Jsoup at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Me... – Tashley Dec 21 '16 at 08:04

1 Answers1

0

You have to place the jar files in Distributed Cache it is a best practise to share the third party libraries,

Please have a look at below links for further help,

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/

prashant khunt
  • 154
  • 3
  • 8