5

So I am trying to run an Apache Spark application on AWS EMR in cluster mode using spark-submit. If I have only one jar to provide in the classpath, it works fine with given option using the --jar and --driver-class-path options. All of my required dependency jars are located in S3 bucket as required by EMR. I am using below command for that purpose in EMR add step option on the EMR AWS console:

--class org.springframework.boot.loader.JarLauncher --jars s3://emrb/gson-2.8.4.jar --driver-class-path s3://emrb/gson-2.8.4.jar

Now, I provide this option in the spark-submit options area in the add step. But if I want to provide multiple dependent jars using the same way, it does not takes up the other jars. I am providing following way and have tried various options but it cannot find the dependent jars:

 --jars s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar --driver-class-path s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar --class org.springframework.boot.loader.JarLauncher
CodeHunter
  • 2,017
  • 2
  • 21
  • 47

2 Answers2

0

You could specify steps parameters in a separate json file :

aws emr add-steps --cluster-id "j-xxx" --steps file://./steps.json

with steps.json containing something like :

[
  {
    "Type":"Spark",
    "Args": [
      "--jars",
      "s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar",
      "--driver-class-path",
      "s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar,s3://emrb/gson-2.8.4.jar",
      "--class",
      "org.springframework.boot.loader.JarLauncher"
    ]
  }
]
pulsation
  • 26
  • 3
-2

You can add the jar files in the spark-defaults. If there is more than one entry in the jars list, use : as separator.

you should use:

--driver-class-path s3://emrb/hadoop_jars/emrfs-hadoop-assembly-2.32.0.jar:s3://emrb/gson-2.8.4.jar

Rahul
  • 717
  • 9
  • 16
  • Have you tried it? Because `:` is generally separator here which comes in s3 classpath/ Hence it fails to take multiple entries that way. – CodeHunter May 07 '19 at 15:44