2

We want to backup the HDFS data in our Cloudera Hadoop cluster to Amazon S3. Looks like we can use distcp for this but what is not clear is if the data is copied to S3 over an encrypted transport.

Is there something that needs to be configured to enable this?

Marco Di Cesare
  • 133
  • 2
  • 7
  • 1
    http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_distcp_data_cluster_migrate.html – Hy L Feb 07 '15 at 00:30
  • Thank you. I had read that page but it doesn't seem to indicate if distcp uses SSL/TLS while the data is in transit between Hadoop and S3. – Marco Di Cesare Feb 09 '15 at 00:06

1 Answers1

3

I don't think S3 client side encryption is available yet in Hadoop.

It seems like S3 server side encryption (encrypting data at rest at S3's end) is configurable from Hadoop 2.5.0.

To enable it add the following property in core-site.xml:

<property>
  <name>fs.s3n.server-side-encryption-algorithm</name>
  <value> AES256 </value>
  <description>
    Specify a server-side encryption algorithm for S3.
    The default is NULL, and the only other currently allowable value is AES256.
  </description>
</property>

More information about S3 server side encryption Hadoop-10568.

Ashrith
  • 6,745
  • 2
  • 29
  • 36
  • Thank for your reply - what I am looking for is how to ensure data is encrypted while in transit. In other words does distcp transfer the data over SSL/TLS to Amazon S3 – Marco Di Cesare Feb 09 '15 at 00:07
  • 1
    From what I can see, encryption in transit works by default. I tested this by placing a policy on my bucket that prevents putobject when securetransport=false. The distcp command would have failed with a 403 error if securetransport was not enabled, so i believe this works – nachonachoman Dec 04 '15 at 16:59
  • I think the property name is `fs.s3a.server-side-encryption-algorithm` – Vishrant May 19 '20 at 16:38