How to submit a SPARK job of which the jar is hosted in S3 object store

Question

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!

have you tried https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html ? — morsik, Mar 29 '20 at 10:07

score 1 · Answer 1 · answered Mar 29 '20 at 10:07

You can use Default Credential Provider Chain from AWS docs:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
./bin/spark-submit \
    --master local[2] \
    --class org.apache.spark.examples.SparkPi \
    s3a://your_bucket/.../spark-examples_2.11-2.4.6-SNAPSHOT.jar

I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a schema in spark-submit (note, you can use --packages directive to reference these dependencies from inside your jar, but not from spark-submit itself):

// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/ 
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar 
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

How to submit a SPARK job of which the jar is hosted in S3 object store

1 Answers1

Linked