I have been getting intermittent issues with when trying to read from an S3 bucket from Databricks in Azure. It can sometimes go months with out working, suddenly work temporarily, and stop again.
The Scala code is as follows:
val access_key = "XXXXXXXXX"
val secret_key = "XXXXXXXXX"
val encoded_secret_key = secret_key.replace("/", "%2F")
val aws_bucket_name = "bucket-name"
val file_path = "filePath"
spark.conf.set("fs.s3n.awsAccessKeyId", access_key)
spark.conf.set("fs.s3n.awsSecretAccessKey", encoded_secret_key)
var df = dbutils.fs.ls(s"""s3a://$aws_bucket_name/$file_path""")
display(df)
Sometimes it will work, other times it won't, all without making any configuration changes. At least not on the code or cluster configuration side. When it does fail, the error is as follows
java.nio.file.AccessDeniedException: s3a:///: getFileStatus on s3a:///: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://..amazonaws.com {} Hadoop 2.7.4, aws-sdk-java/1.11.655 Linux/5.4.0-1063-azure OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 scala/2.12.10 vendor/Azul_Systems,_Inc. com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: , Extended Request ID: <long/id>, Cloud Provider: Azure, Instance ID: (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: ; S3 Extended Request ID: ), S3 Extended Request ID: :403 Forbidden
I'm not even sure how to troubleshoot. Connection works fine with python (boto3) in the same notebook, but the Scala doesn't work.
We are using Spark 3.0.1, Scala 2.12