0

We are running Hadoop 3.2.1 in an environment without multiple users in a secure datacenter. We prefer to have encrypted data transfers for activity between nodes. We have determined that we do not need to set up Kerberos, so I am working through getting encryption going on block data transfer and web services.

I appear to have DFS encryption enabled thanks to the following settings in hdfs-site.xml:

<!-- SECURITY -->
  <property>
    <name>dfs.encrypt.data.transfer</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>

I was getting handshake errors on the datanodes with dfs.encrypt.data.transfer enabled until I also set dfs.block.access.token.enable.

Filesystem operations work great now, but I still see plenty of this:

2020-02-04 15:25:59,492 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

I reckon that SASL is a Kerberos feature that I shouldn't ever expect to see reported as true. Does that sound right?

Is there a way to verify that DFS is encrypting data between nodes? (I could get a sniffer out...)

dannyman
  • 621
  • 1
  • 9
  • 27
  • You should read the HortonWorks docs about RPC encryption... Especially the part about SASL-QoP using Kerberos creds for "negotiating" and setting up the encryption key. – Samson Scharfrichter Feb 05 '20 at 09:00

1 Answers1

0

To answer my own question: I never found a log message saying "yes, you have enabled encryption." I did, however, run a simple benchmark and noticed differences in performance consistent with encryption taking place:

Time it took to run a hadoop distcp:

  • no crypto: 5 minutes
  • 3des: 70 minutes
  • rc4: 12 minutes
  • 3des + AES, 128 bit: 16 minutes
  • 3des + AES, 256 bit: 18 minutes

Here is a bit of a Jinja template for hdfs-site.xml, which configures dfs.encrypt for 3des, AES 256 bits, IF hadoop_dfs_encrypt=true:

<!-- SECURITY -->
  <property>
    <name>dfs.encrypt.data.transfer</name>
    <value>{{ hadoop_dfs_encrypt | default(false) }}</value>
  </property>
  <property>
    <name>dfs.block.access.token.enable</name>
    <value>{{ hadoop_dfs_encrypt | default(false) }}</value>
  </property>
  <property>
    <name>dfs.encrypt.data.transfer.cipher.suites</name> 
    <value>AES/CTR/NoPadding</value>
  </property>
  <property>
    <name>dfs.encrypt.data.transfer.cipher.key.bitlength</name>
    <value>256</value> 
  </property>

From what I have read, the dfs.encrypt key exchange between the NN and the DNs is unprotected unless you set hadoop.rpc.protection=privacy. By all accounts, this requires Kerberos, but I am still researching my options there.

dannyman
  • 621
  • 1
  • 9
  • 27