0

I'm running a PySpark (Spark 3.1.1) application in cluster mode on YARN cluster, which is supposed to process input data and send appropriate kafka messages to a given topic.

Data manipulation part is already covered, however I struggle to use kafka-python library to send the notifications. The problem is that it can't find a valid kerberos ticket to authenticate to kafka cluster.

While executing spark3-submit I add --principal and --keytab properties (equivalents to spark.kerberos.keytab and spark.kerberos.principal). Moreover, I am able to access HDFS and HBase resources.

  1. Does Spark store TGT in a ticket cache that I can reference by setting krb5ccname variable? I am not able to locate a valid kerberos ticket while the app is running.
  2. Is it common to issue kinit from PySpark application to create a ticket to get an access to resources outside HDFS etc.? I tried using krbticket module to issue kinit command from the app (using keytab that I pass as a parameter to spark3-submit), however then the process hangs.
benji__
  • 47
  • 3
  • 9

0 Answers0