So I am trying to run an Apache Spark
application on AWS EMR in cluster
mode using spark-submit
. If I have only one jar to provide in the classpath, it works fine with given option using the --jar
and --driver-class-path
options. All of my required dependency jars are located in S3
bucket as required by EMR. I am using below command for that purpose in EMR add step option on the EMR AWS console:
--class org.springframework.boot.loader.JarLauncher --jars s3://emrb/gson-2.8.4.jar --driver-class-path s3://emrb/gson-2.8.4.jar
Now, I provide this option in the spark-submit options
area in the add step
. But if I want to provide multiple dependent jars using the same way, it does not takes up the other jars. I am providing following way and have tried various options but it cannot find the dependent jars:
--jars s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar --driver-class-path s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar --class org.springframework.boot.loader.JarLauncher