Authenticate spark structured streaming against kafka with delegation token, in scala

Question

I'm trying to stream messages out of kafka with spark structured streaming in scala as per spark documentation like this:

val sparkConfig = new SparkConf()
      .setAppName("Some.app.name")
      .setMaster("local")
        
val spark = SparkSession
      .builder
      .config(sparkConfig)
      .getOrCreate()

val dataframe = spark
     .readStream
     .format("kafka")
     .option("subscribe", kafkaTopic)
     .option("kafka.bootstrap.servers", kafkaEndpoint)
     .option("kafka.security.protocol", "SASL_PLAINTEXT")
     .option("kafka.sasl.username", "$ConnectionString")
     .option("kafka.sasl.password", kafkaConnectionString)
     .option("kafka.sasl.mechanism", "PLAIN")
     .option("spark.kafka.clusters.cluster.sasl.token.mechanism", "SASL_PLAINTEXT")
     .option("includeHeaders", "true")
     .load() 

val outputAllToConsoleQuery = dataframe
    .writeStream
    .format("console")
    .start()
                    
outputAllToConsoleQuery.awaitTermination()

Which of course fails with Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set

As per spark documentation here "..the application can be configured via Spark parameters and may not need JAAS login configuration". I have also read kafka documentation. I think I can get the idea, but I haven't found a way to actually code it, nor have I found any example. Could someone provide the code in scala that configures spark structured streaming to authenticate against kafka and use delegation token, without JAAS configuration file?

`${cluster}` is a placeholder. Should be replaced with an actual value — OneCricketeer, May 27 '21 at 13:08
Alright, well, I'm not sure this solves anything, but the error `System property 'java.security.auth.login.config' is not set` means you need to provide that flag as a Java option to the executors I think. E.g see for "Spark parameters" https://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job — OneCricketeer, May 28 '21 at 12:36
Also see the example at the bottom of the page https://spark.apache.org/docs/3.1.1/structured-streaming-kafka-integration.html#jaas-login-configuration ... I've not used these properties, but the table discussing delegation tokens seems to say you need another property for bootstrap servers, prefixed by your `spark.kafka.clusters.cluster.`, explicitly referencing the target regex setting in the docs above the table — OneCricketeer, May 28 '21 at 12:44
@OneCricketeer, thanks, i've kind of descihpered that and tried a few options I've guessed but none worked. At this point I'd need a working example... but haven't been able to find any so far... — Luis Mesa, May 28 '21 at 13:42
I would think looking into the PR around the feature and the integration tests should give some idea of how it is expected to work (or there is a bug) https://github.com/apache/spark/commit/d47c219f94f478b4b90bf6f74f78762ea301ebf9 And seems you might need as a dependency `spark-token-provider-kafka-0-10_2.12` also the kerberized example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/DirectKerberizedKafkaWordCount.scala — OneCricketeer, May 28 '21 at 15:21

Authenticate spark structured streaming against kafka with delegation token, in scala

0 Answers0