Unable to read from s3 bucket using spark

Question

val spark = SparkSession
        .builder()
        .appName("try1")
        .master("local")
        .getOrCreate()

val df = spark.read
        .json("s3n://BUCKET-NAME/FOLDER/FILE.json")
        .select($"uid").show(5)

I have given the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY as environment variables. I face below error while trying to read from S3.

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/FOLDER%2FFILE.json' - ResponseCode=400, ResponseMessage=Bad Request

I suspect the error is caused due to "/" being converted to "%2F" by some internal function as the error shows '/FOLDER%2FFILE.json' instead of '/FOLDER/FILE.json'

eliasah · Answer 1 · 2017-06-16T13:59:25.923

1

Your spark (jvm) application cannot read environment variable if you don't tell it to, so a quick work around :

spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)

You'll also need to precise the s3 endpoint :

spark.sparkContext
     .hadoopConfiguration.set("fs.s3a.endpoint", "<<ENDPOINT>>");

To know more about what is AWS S3 Endpoint, refer to the following documentation :

edited Jun 16 '17 at 13:59

answered Jun 16 '17 at 13:11

eliasah

39,588
11
124
154

Thanks @elisah, I tried including your the aws credentials in the code like you have mentioned but I still have the same error with code 400. I'm assuming this is not an issue because of credentials as it would throw an authentication error that way (error code 403)? – san8055 Jun 16 '17 at 13:46
1

there's a section on S3A troubleshooting in the Hadoop docs; you should start there. Let's just say "bad auth" has a lot of possible causes – stevel Jun 19 '17 at 10:46

Unable to read from s3 bucket using spark

1 Answers1

Linked