The oozie workflow launcher sometimes fails (KILLED status) due to the loading order of the classpath. In SparkSubmit, a call to method in ivy 2.4.0 exists, but this particular method is not in ivy 2.0.0-rc2. The workflow process usually runs fine (SUCCEEDED) for most hourly nominal times, but the launch infrequently fails as ivy 2.0 is loaded instead of ivy 2.4. Upon failure, the (redacted) oozie launcher log shows this stack call:
2017-10-31 20:37:30,339 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[xxxx-oozie-lv-102.xxx.net] USER[xxxxx] GROUP[-] TOKEN[] APP[xxxx-proc-oozie] JOB[0143924-170929213137940-oozie-oozi-W] ACTION[0143924-170929213137940-oozie-oozi-W@xxxx] Launcher exception: org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.setDefaultConf(Ljava/lang/String;)V
java.lang.NoSuchMethodError: org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.setDefaultConf(Ljava/lang/String;)V
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1054)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:264)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:214)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:60)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:233)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
It seems that Cloudera Distributed Hadoop contains ivy 2.0.0-rc2, but its SparkSubmit seems to require ivy version 2.4.0. I have tried to include ivy 2.4 in my jar and excluding 2.0, but this is even before my process is launched (so maybe this is a bit of ridiculous). I figure there must be a way to force the 2.4.0 version to have some precedence in the oozie loading process and have tried oozie.launcher.mapreduce.user.classpath.first
to either true or false -- In any case, the job properties file does/must contain:
oozie.libpath=${nameNode}/user/spark/share/XXXX-spark/
oozie.use.system.libpath=true
Note: Dropping ivy in the libpath above didn't seem to make a difference.
It's likely that the workflow needs an extra flag or ... like this:
<configuration>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-verbose</value>
</property>
</configuration>
The team (SRE) that manages the cluster prefers to use the original jars with the CDH 5.9.2.
How can I force spark-submit to use ivy 2.4 (and not 2.0) by changing the workflow.xml, job properties, my build or ... in a way that will satisfy SRE requirements to keep the CDH intact? Can I solve this by invalidating the cache.
Please be aware that mentioning to add the ivy 2.4.0 jar to a classpath needs some details of exactly where to put the ivy jar on hdfs, accessing the jar in some path or ...