2

I am trying to connect from Spark (running on my PC) to my S3 bucket:

 val spark = SparkSession
      .builder
      .appName("S3Client")
      .config("spark.master", "local")
      .getOrCreate()

val sc = spark.sparkContext;
    sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
    sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
    val txtFile = sc.textFile("s3a://bucket-name/folder/file.txt")
    val contents = txtFile.collect();

But getting the following exception:

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 07A7BDC9135BCC84, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 6ly2vhZ2mAJdQl5UZ/QUdilFFN1hKhRzirw6h441oosGz+PLIvLW2fXsZ9xmd8cuBrNHCdh8UPE=

I have seen this question but it didn't help me.

Edit:

As Zack suggested, I added:

sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.eu-central-1.amazonaws.com")

But I still get the same exception.

Alon
  • 10,381
  • 23
  • 88
  • 152
  • Could you try adding `sc.hadoopConfiguration.set("fs.s3a.impl " , "org.apache.hadoop.fs.s3a.S3AFileSystem")` – Knight71 Jul 22 '19 at 09:04

4 Answers4

3

I've solve the problem.

I was targeting a region (Frankfurt) that required using version 4 of the signature.

I've changed the region of the S3 bucket to Ireland and now it's working.

Alon
  • 10,381
  • 23
  • 88
  • 152
2

According to s3 doc, some region only support "Signature Version(s) 4", need to add the configurations below:

--conf "spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true"

and

--conf "spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true"
Hayman
  • 66
  • 3
0

Alon,

try the below configurations:

val spark = SparkSession
      .builder
      .appName("S3Client")
      .config("spark.master", "local")
      .getOrCreate()

val sc = spark.sparkContext;
    sc.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY)
    sc.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY)
    sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3.us-east-1.amazonaws.com")
    val txtFile = sc.textFile("s3a://s3a://bucket-name/folder/file.txt")
    val contents = txtFile.collect();

I believe your issue was due to you not specifying your endpoint in the configuration set. Sub out us-east-1 for whichever region you use.

Zack
  • 2,296
  • 20
  • 28
0

This works for me (this is everything...no other export etc. is needed)

    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key", AWS_KEY)
    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET)
    sparkContext._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")

to run:

spark-submit --conf spark.driver.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4' --conf spark.executor.extraJavaOptions='-Dcom.amazonaws.services.s3.enableV4'   --packages org.apache.hadoop:hadoop-aws:2.7.1  spark_read_s3.py
Sanjay Das
  • 180
  • 3
  • 14