0

In my organisation, we have a Spark Thrift server setup with HTTP & SSL because there is an underlying assumption that the binary mode is not securely encrypted over the wire and thus may reveal credentials or sensitive query data.

I have Googled, scan read a research paper and looked at the Thrift protocol spec searching for a definitive answer, but to no avail. It seems that the sheer lack of mention on authentication and encryption means that it is expected to be taken care of by an encasing networking layer?

Is the assumption that a Spark Thrift server in binary mode is transmitting unencrypted or otherwise insecure data correct?

QA Collective
  • 2,222
  • 21
  • 34

1 Answers1

0

The Thrift protocol does include low level transport:

Apache Thrift Layered Architecture

In the context of a Spark Thrift server this can be enabled in the hive-site.xml file like this:

<property>
    <name>hive.server2.use.SSL</name>
    <value>true</value>
</property>

Combined with the default TCP Thrift protocol, this does encrypt the thrift protocol traffic. There is not a lot of explicit documentation on this, but since the Spark Thrift server is a fork of the Hive2 server, I found this about setting up a Hive2 server which implies this is possible:

Setting up a hive2 server

The final problem seems to be that some tools, notably Power BI do not seem to be able to use SSL for a 'Standard' (TCP Thrift protocol) connection.

QA Collective
  • 2,222
  • 21
  • 34