AWS EMR - Write to S3 Using the Correct Encryption Key

Question

I have an EMR cluster (v5.12.1) and my S3 bucket setup with encryption at rest using the same AWS SSE-KMS key.

Reading the data from S3 works fine, but when I write to my S3 bucket using a Pyspark script - the parquet files are encrypted using the default 'aws/s3' key.

How can I get Spark to use the correct KMS key?

The cluster has Hadoop 2.8.3 and Spark 2.2.1

score 5 · Accepted Answer · answered Jun 18 '18 at 04:10

5

The solution is to not use s3a:// or s3n:// paths for your output files.

The files will be written to S3 and encrypted with the correct SSE-KMS key if you use the s3:// prefix only.

answered Jun 18 '18 at 04:10

minus34

255
1
5
16

score 0 · Answer 2 · answered Jun 11 '21 at 09:49

If you are using CMK make sure you are using that while creating the EMR cluster under configuration section:

{
    "Classification": "emrfs-site",
    "Properties": {
                   "fs.s3.enableServerSideEncryption": "true",
                   "fs.s3.serverSideEncryption.kms.keyId": "<YOUR_CMK>"
                  }
}

AWS EMR - Write to S3 Using the Correct Encryption Key

2 Answers2