spark-streaming kafka kerberos

Question

I'm working on a poc spark-streaming job pulling from kafka. I am able to use the same code against a non-secured kafka 0.10 cluster however when I switch to run against an ssl/kerberos ( hdp 2.5 ) setup I'm getting an exception:

Caused by: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:940)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:69)
at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:110)
at org.apache.kafka.common.security.authenticator.LoginManager.<init>(LoginManager.java:46)
at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:68)
at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:78)
... 29 more

The spark session creates fine but when the executor fires up to consume new content from the topic I get the above exception.

The submission code is fairly simple:

spark-submit \
--master yarn \
--keytab ./bilsch.keytab \
--principal bilsch@HDP.SOME.ORG \
--files kafka_client_jaas.conf,bilsch.keytab \
--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.0.2 \
--repositories http://repo.hortonworks.com/content/repositories/releases \
--num-executors 1 \
--class producer \
--driver-java-options "-Djava.security.auth.login.config=./kafka_client_jaas.conf -Dhdp.version=2.5.3.0-37" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_client_jaas.conf" \
streaming_0331_2.11-1.0.jar 2>&1 | tee out

If it helps I'm following michael-nolls spark-streaming with kafka tutorial

Not sure what needs to be passed to the executors which I'm not already passing or maybe its just a jaas config issue?

Actually at least partially figured this out myself. I had to put the jaas config into my home dir on all of the workers ( did the same with the keytab ) which feels really really wrong. What am I missing here - I thought the files section of the spark-submit would put those files into the distributed cache and make them available on the cluster for me ( using ./foo.keytab for instance ) — Bill Schwanitz, Apr 03 '17 at 20:51
Nah -- Spark does *not* use HDFS to pass resources to executors; it only passes an URL where the executors can **download** the resources, from the driver, in HTTP. I guess the purpose is to use the same code base with YARN, Mesos, or standalone Spark. — Samson Scharfrichter, Apr 04 '17 at 18:14
And the files are *not* downloaded in the current working dir -- cf. http://stackoverflow.com/questions/37055038/why-are-sc-addfile-and-spark-submit-files-not-distributing-a-local-file-to — Samson Scharfrichter, Apr 04 '17 at 18:14

score 0 · Answer 1 · answered Jan 02 '18 at 16:07

0

Try to follow the steps provided below:-

Remove --keytab and --principal from spark-submit command as this information is already present in the JAAS configuration.
Run kinit -kt bilsch.keytab bilsch@HDP.SOME.ORG and then try to run the spark-submit command.

answered Jan 02 '18 at 16:07

Ashish Singh

275
3
5

Detailed article https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/developing-spark-applications/content/running_spark_streaming_jobs_on_a_kerberos-enabled_cluster.html – Abdul Mannan Aug 19 '20 at 07:42

spark-streaming kafka kerberos

1 Answers1