0

My task is to process satellite images with spark. The images are stored in S3 in central Europe. To access those file I need v4 authentication api compatible S3 client.

My pyspark code works on US bucket without any problem but fails with EU buckets because it uses v4 authentication. I followed this tutorial and applied this troubleshooting guide. But still I get this error:

$ pyspark --properties-file s3.properties
...
>>> image_rdd = sc.binaryFiles('s3a://sentinel-s2-l1c/tiles/31/U/FT/2017/10/15/0/preview.jp2')
...
py4j.protocol.Py4JJavaError: An error occurred while calling o19.binaryFiles.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: BD9957E5F3960247, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: oCgfA+foevj6CEFWO0F22H+AVbqr4F0hr7c4M7OlILxOSb0ZZ25FqHhZnzgyLxRPMuiyeOdjnSM=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)

My s3.properties file:

spark.hadoop.fs.s3a.impl        org.apache.hadoop.fs.s3a.S3AFileSystem
spark.driver.extraClassPath     hadoop-aws-2.7.3.jar:aws-java-sdk-1.7.4.jar
spark.hadoop.fs.s3a.endpoint    s3.eu-central-1.amazonaws.com
spark.hadoop.fs.s3a.access.key  [access_key]
spark.hadoop.fs.s3a.secret.key  [secret_key]
spark.hadoop.fs.s3a.impl.disable.cache true

My setup:

  • macOS
  • local apache-spark with Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_144
  • Python 3.6.3

Question: What is the correct way to configure pyspark to connect to S3 with v4 authentication? Where should I write the configuration parameters?

gkiko
  • 2,283
  • 3
  • 30
  • 50

0 Answers0