1

I have a standalone spark cluster on Kubernetes and I want to use that to load some temp views in memory and expose them via JDBC using spark thrift server.

I already got it working with no security by submitting a spark job (pyspark in my case) and starting thrift server in this same job so I can access the temp views.

Since I'll need to expose some sensitive data, I want to apply at least an authentication mechanism.

I've been reading a lot and I see basically 2 methods to do so:

  • PAM - which is not advised for production since some critical files needs to have grant permission to user beside root.
  • Kerberos - which appears to be the most appropriate one for this situation.

My question is: - For a standalone spark cluster (running on K8s) is Kerberos the best approach? If not which one? - If Kerberos is the best one, it's really hard to find some guidance or step by step on how to setup Kerberos to work with spark thrift server specially in my case where I'm not using any specific distribution (MapR, Hortonworks, etc).

Appreciate your help

Bruno Faria
  • 549
  • 1
  • 6
  • 20
  • AFAIK Spark "Thrift Server" uses Hive libraries and Hive conf file format as-is. So read the Hive documentation and tutorials... – Samson Scharfrichter Jul 11 '19 at 09:47
  • Also, install a Horton sandbox and see how Ambari manages the Kerberos SPNs for HiveServer2 and/or Thriftserver, and sets up the conf files. – Samson Scharfrichter Jul 11 '19 at 09:50
  • Trouble is, Kerberos is **not** designed for ephemeral services and virtual IPs. Hive also supports LDAP auth, see if the libs bundled with Thriftserver also support that out-of-the-box (even if by accident). – Samson Scharfrichter Jul 11 '19 at 09:54

0 Answers0