Can't connect from Spark to S3 - AmazonS3Exception Status Code: 400

Question

I am trying to connect from Spark (running on my PC) to my S3 bucket:

 val spark = SparkSession
      .builder
      .appName("S3Client")
      .config("spark.master", "local")
      .getOrCreate()

val sc = spark.sparkContext;
    sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
    sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
    val txtFile = sc.textFile("s3a://bucket-name/folder/file.txt")
    val contents = txtFile.collect();

But getting the following exception:

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 07A7BDC9135BCC84, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 6ly2vhZ2mAJdQl5UZ/QUdilFFN1hKhRzirw6h441oosGz+PLIvLW2fXsZ9xmd8cuBrNHCdh8UPE=

I have seen this question but it didn't help me.

Edit:

As Zack suggested, I added:

sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com")

But I still get the same exception.

Could you try adding `sc.hadoopConfiguration.set("fs.s3a.impl " , "org.apache.hadoop.fs.s3a.S3AFileSystem")` — Knight71, Jul 22 '19 at 09:04

score 3 · Accepted Answer · answered Jul 22 '19 at 14:06

3

I've solve the problem.

I was targeting a region (Frankfurt) that required using version 4 of the signature.

I've changed the region of the S3 bucket to Ireland and now it's working.

answered Jul 22 '19 at 14:06

Alon

10,381
23
88
152

score 2 · Answer 2 · answered Jul 23 '19 at 09:45

According to s3 doc, some region only support "Signature Version(s) 4", need to add the configurations below:

--conf "spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true"

and

--conf "spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true"

score 0 · Answer 3 · answered Jul 15 '19 at 20:29

0

Alon,

try the below configurations:

val spark = SparkSession
      .builder
      .appName("S3Client")
      .config("spark.master", "local")
      .getOrCreate()

val sc = spark.sparkContext;
    sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
    sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
    sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.us-east-1.amazonaws.com")
    val txtFile = sc.textFile("s3a://s3a://bucket-name/folder/file.txt")
    val contents = txtFile.collect();

I believe your issue was due to you not specifying your endpoint in the configuration set. Sub out us-east-1 for whichever region you use.

answered Jul 15 '19 at 20:29

Zack

2,296
20
28

I've added the endpoint (in my case the region is eu-central-1), but I still get the same exception – Alon Jul 15 '19 at 20:38
@Alon - Did you resolved the issue? can you provide the solution – Learn Hadoop Apr 22 '20 at 12:26
@LearnHadoop Yes I have, You can see my accepted answer here. – Alon Apr 22 '20 at 14:47

score 0 · Answer 4 · answered May 26 '20 at 21:13

This works for me (this is everything...no other export etc. is needed)

    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key", AWS_KEY)
    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET)
    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")

to run:

spark-submit --conf spark.driver.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' --conf spark.executor.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4'   --packages org.apache.hadoop:hadoop-aws:2.7.1  spark_read_s3.py

Can't connect from Spark to S3 - AmazonS3Exception Status Code: 400

4 Answers4