I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!
Asked
Active
Viewed 4,071 times
1
-
have you tried https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html ? – morsik Mar 29 '20 at 10:07
1 Answers
1
You can use Default Credential Provider Chain
from AWS docs:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
./bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
s3a://your_bucket/.../spark-examples_2.11-2.4.6-SNAPSHOT.jar
I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a
schema in spark-submit
(note, you can use --packages
directive to reference these dependencies from inside your jar, but not from spark-submit
itself):
// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

morsik
- 1,250
- 14
- 17