403 error when connecting to S3 using Scala in Azure databricks, Python with boto3 works fine

Question

I have been getting intermittent issues with when trying to read from an S3 bucket from Databricks in Azure. It can sometimes go months with out working, suddenly work temporarily, and stop again.

The Scala code is as follows:

val access_key = "XXXXXXXXX"
val secret_key = "XXXXXXXXX"
val encoded_secret_key = secret_key.replace("/", "%2F")
val aws_bucket_name = "bucket-name"
val file_path = "filePath"

spark.conf.set("fs.s3n.awsAccessKeyId", access_key)
spark.conf.set("fs.s3n.awsSecretAccessKey", encoded_secret_key)

var df = dbutils.fs.ls(s"""s3a://$aws_bucket_name/$file_path""")

display(df)

Sometimes it will work, other times it won't, all without making any configuration changes. At least not on the code or cluster configuration side. When it does fail, the error is as follows

java.nio.file.AccessDeniedException: s3a:///: getFileStatus on s3a:///: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://..amazonaws.com {} Hadoop 2.7.4, aws-sdk-java/1.11.655 Linux/5.4.0-1063-azure OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 scala/2.12.10 vendor/Azul_Systems,_Inc. com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: , Extended Request ID: <long/id>, Cloud Provider: Azure, Instance ID: (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: ; S3 Extended Request ID: ), S3 Extended Request ID: :403 Forbidden

I'm not even sure how to troubleshoot. Connection works fine with python (boto3) in the same notebook, but the Scala doesn't work.

We are using Spark 3.0.1, Scala 2.12

What is the IAM roles & permissions assigned to the user that you're using the access and secret key for? Exact & complete roles and/or inline policies please. — Ermiya Eskandary, Jan 13 '22 at 20:49
why are you setting the fs.s3n options? does the databrick document recommend this, or are you just copying from an SO post of ten years ago? — stevel, Jan 17 '22 at 12:37
@stevel This was just the code we had that was working before. The document I've read only has python code and not scala. What should we be using instead? — ewong18, Jan 18 '22 at 00:27
use the s3a docs, not superstition passed down one incorrect SO post at a time https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html — stevel, Jan 18 '22 at 17:04

403 error when connecting to S3 using Scala in Azure databricks, Python with boto3 works fine

0 Answers0