I'm running a PySpark (Spark 3.1.1) application in cluster mode on YARN cluster, which is supposed to process input data and send appropriate kafka messages to a given topic.
Data manipulation part is already covered, however I struggle to use kafka-python library to send the notifications. The problem is that it can't find a valid kerberos ticket to authenticate to kafka cluster.
While executing spark3-submit
I add --principal
and --keytab
properties (equivalents to spark.kerberos.keytab
and spark.kerberos.principal
). Moreover, I am able to access HDFS and HBase resources.
- Does Spark store TGT in a ticket cache that I can reference by setting
krb5ccname
variable? I am not able to locate a valid kerberos ticket while the app is running. - Is it common to issue
kinit
from PySpark application to create a ticket to get an access to resources outside HDFS etc.? I tried usingkrbticket
module to issuekinit
command from the app (using keytab that I pass as a parameter tospark3-submit
), however then the process hangs.