I am using IBM object storage (comparable to AWS S3) to store data. IBM's object storage implements the S3 api. Spark's Hadoop Configuration can be modified to allow it to connect to Amazon S3.
I am attempting (in pyspark) to access my data by setting the endpoints to point to IBM, as opposed to Amazon.
sc._jsc.hadoopConfiguration().set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3-api.us-geo.objectstorage.service.networklayer.com")
sc._jsc.hadoopConfiguration().set('fs.s3a.access.key', '<my_key>')
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "<secret_key>")
Which throws the error:
An error occurred while calling o131.partitions.
: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 7F46A12CDBB841AA)
Note the "Service: Amazon S3;" Which makes me assume that the SDK is still pointed towards AWS. Can this be changed?