3

I am trying to write an RDD into S3 with server side encryption. Following is my piece of code.

val sparkConf = new SparkConf().
  setMaster("local[*]").
  setAppName("aws-encryption")
val sc = new SparkContext(sparkConf)
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", AWS_SECRET_KEY)
sc.hadoopConfiguration.setBoolean("fs.s3n.sse.enabled", true)
sc.hadoopConfiguration.set("fs.s3n.enableServerSideEncryption", "true")
sc.hadoopConfiguration.setBoolean("fs.s3n.enableServerSideEncryption", true)
sc.hadoopConfiguration.set("fs.s3n.sse", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.serverSideEncryptionAlgorithm", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.server-side-encryption-algorithm", "SSE-KMS")
sc.hadoopConfiguration.set("fs.s3n.sse.kms.keyId", KMS_ID)
sc.hadoopConfiguration.set("fs.s3n.serverSideEncryptionKey", KMS_ID)

val rdd = sc.parallelize(Seq("one", "two", "three", "four"))
rdd.saveAsTextFile(s"s3n://$bucket/$objKey")

This code is writing RDD on S3 but without encryption. [I have checked properties of the written object and it shows server-side encrypted is "no".] Am I skipping anything here or using any property incorrectly?

Any suggestion would be appreciated.

P.S. I have set same properties with different name, reason being I am not sure when to use which name for e.g.

sc.hadoopConfiguration.setBoolean("fs.s3n.sse.enabled", true)
sc.hadoopConfiguration.set("fs.s3n.enableServerSideEncryption", "true")
sc.hadoopConfiguration.setBoolean("fs.s3n.enableServerSideEncryption", true)

Thank you.

Vikash Pareek
  • 1,063
  • 14
  • 30

1 Answers1

2
  1. stop using s3n, switch to s3a. I don't remember what s3n does with encryption, but you should move on performance and scale alone.
  2. start with SSE-S3 over SSE-KMS, as it's easier to set up
  3. turn on encryption in the client via the relevant s3a properties (see below)
  4. add a bucket policy to mandate encryption. That makes sure all clients are always set up right.

Example policy

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>

See Working with Encrypted Amazon S3 Data; these are the current (oct 2019) best docs on encrypting S3 with s3A & hadoop, spark & hive

AWS EMR readers: None of this applies to you. Switch to Apache Hadoop or look up the EMR docs.

stevel
  • 12,567
  • 1
  • 39
  • 50
  • Thank you for the response. I am able to upload data to S3 with encryption with AES256 method, but my requirement is to use KMS for the encryption. When I am trying to use "SSE-KMS" it is throwing the following exception: Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 97E10A1A684D23AD, AWS Error Code: InvalidArgument, AWS Error Message: The encryption method specified is not supported, S3 Extended Request ID: rzcCh6RKy3FuG5SU1Q0pDANHvxnAH/dk77Mlbbs8KaCW1QwunOm81GkvD9Furz1MOzfqtZR2ALg= – Vikash Pareek Sep 11 '17 at 10:27
  • KMS is Hadoop 2.8.0+ only – stevel Sep 14 '17 at 09:38
  • I have tried with Hadoop 2.8.1, still getting the same exception: The encryption method specified is not supported (Service: Amazon S3; Status Code: 400; Error Code: InvalidArgument; Request ID: 52ECB3284409F786) – Vikash Pareek Sep 18 '17 at 10:33
  • Well, afraid you are now down to where the rest of us would be: hooking a debugger to the service – stevel Sep 18 '17 at 12:18
  • Alright, will proceed with debugging. As a part of an experiment, I am able to use KMS with Hadoop 3.0.0alpha version. Sorry for asking, could you please let me know if there is an official document which mentioned that KMS is supported with Hadoop 2.8.0+. – Vikash Pareek Sep 18 '17 at 12:32
  • you'll have to rummage around the hadoop-apache.org site, – stevel Sep 20 '17 at 13:16
  • looking some more, SSE-KMS only went in for 2.9+, https://issues.apache.org/jira/browse/HADOOP-13075. However, AWS now supports default bucket encryption: you can configure the bucket and everything is always KMS encrytped – stevel Dec 21 '17 at 11:40
  • That Hortonworks link is broken. Is [this](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.0/bk_cloud-data-access/content/s3-encryption.html) the new location? – Nick Chammas Oct 28 '19 at 14:46
  • yeah, updated. Thanks nick. My stance is: set the bucket for SSE-SE or, if you want strict, use SSE-KMS. For that -know the Chinese govt approve the crypto hardware in their country – stevel Oct 28 '19 at 16:09