I am running Cascading (actually Scalding) hadoop job that uses DistributedCache for dependent jars.
Fist time it works fine (meaning that the classpath is set up correctly) but then it starts failing with ClassNotFoundException:
java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.io.MultiInputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
...
Did anybody else have success with Cascading and jars in the DistributedCache
This message seems to imply that Cascading has some internal handling of the distributed cache jars. Any light you can shed on this?
Edit: I am using Cascading 2.1.6 on Hadoop 1.0.3