2

I'm executing a spark-submit script in an EMR step that has my super JAR as the main class, like

  spark-submit \
    ....
    --class ${MY_CLASS} "${SUPER_JAR_S3_PATH}"

... etc

but Spark is by default loading the jar file:/usr/lib/spark/jars/guice-3.0.jar which contains com.google.inject.internal.InjectorImpl, a class that's also in the Guice-4.x jar which is in my super JAR. This results in a java.lang.IllegalAccessError when my service is booting up.

I've tried setting some Spark conf in the spark-submit to put my super jar in the classpath in hopes of it getting loaded first, before Spark loads guice-3.0.jar. It looks like:

--jars "${ASSEMBLY_JAR_S3_PATH}" \
 --driver-class-path "/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:${SUPER_JAR_S3_PATH}" \
 --conf spark.executor.extraClassPath="/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:${SUPER_JAR_S3_PATH}" \

but this results in the same error.

Is there a way to remove that guice-3.0.jar from the default spark classpath so my code can use the InjectorImpl that's packaged in the Guice-4.x JAR? I'm also running Spark in client mode so I can't use spark.driver.userClassPathFirst or spark.executor.userClassPathFirst

user3613290
  • 461
  • 6
  • 18

1 Answers1

2

one way is point to lib where your guice old version of jar is there and then exclude it.

sample shell script for spark-submit :

export latestguicejar='your path to latest guice jar'

   #!/bin/sh
    # build all other dependent jars in OTHER_JARS

JARS=`find /usr/lib/spark/jars/ -name '*.jar'`
OTHER_JARS=""
   for eachjarinlib in $JARS ; do    
if [ "$eachjarinlib" != "guice-3.0.jar" ]; then
       OTHER_JARS=$eachjarinlib,$OTHER_JARS
fi
done
echo ---final list of jars are : $OTHER_JARS
echo $CLASSPATH

spark-submit --verbose --class <yourclass>
... OTHER OPTIONS
--jars $OTHER_JARS,$latestguicejar,APPLICATIONJARTOBEADDEDSEPERATELY.JAR

also see holdens answer. check with your version of the spark what is available.

As per docs runtime-environment userClassPathFirst are present in the latest version of spark as of today.

spark.executor.userClassPathFirst
spark.driver.userClassPathFirst

for this to use you can make uber jar with all application level dependencies.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Holdens answer / userClassPathFirst doesn't apply to this situation since I'm running Spark in client mode (I'll edit my question to include this information, my bad). I'll try out your suggestion though. – user3613290 May 22 '19 at 13:23
  • ok do you still need driver an executor classpaths as you are sending --jars with you uber jar. i mean you can put your uber jar in driver and executor classpath if you really have to do so – Ram Ghadiyaram May 22 '19 at 20:56
  • hi any luck here were you able to resolve this issue ? – Ram Ghadiyaram Jun 02 '19 at 01:52