1

I am having a problem to set the required parameters to connect to Kafka from spark using TLS. This is my current approach:

spark.readStream
            .format("kafka")
            .option("kafka.bootstrap.servers", "<my url>:tls port")
            .option("security.protocol", "SSL")
            .option("ssl.key.password", "<key password>")
            .option("ssl.keystore.location", "<keystore location>")
            .option("ssl.keystore.password", "<keystore password>")
            .option("ssl.endpoint.identification.algorithm", "")
            .option("subscribe",  "<my topic>")
            .load()

I've also tried to use the prefix kafka. and include the configurations in my spark submit (using --conf or including the .jks file location in --files). It also doesn't solve the problem if I use spark.read instead of spark.readStream.

The problem so can be represented in the logs, where the parameters I am setting are still null or continue having the default value. Also, the connection fails like when I was trying to connect without using TLS certificate (that is required for my current kafka):

{"Application":"My test application" ,"level": "INFO ", "timestamp": "2021-05-20 15:33:07,485", "classname": "org.apache.kafka.clients.consumer.ConsumerConfig", "body": "ConsumerConfig values: 
        [...]
        security.protocol = PLAINTEXT
        [...]
        ssl.endpoint.identification.algorithm = https
        ssl.key.password = null
        [...]
        ssl.keystore.location = null
        ssl.keystore.password = null
        [...]
{"Application":"My test application" ,"level": "WARN ", "timestamp": "2021-05-20 15:33:08,669", "classname": "org.apache.kafka.clients.NetworkClient", "body": "[Consumer clientId=consumer-sample_table-1, groupId=sample_table] Bootstrap broker <my ip> (id: -1 rack: null) disconnected"}
Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Currently, I am using spark 3.0.0 and scala 2.12. Also, I am submiting the job using the following command:

$SPARK_HOME/spark-submit --name "My application" \
--master yarn \
--deploy-mode client \
--class <main class> \
application.jar

Did anyone have a similar problem? Thank you.

Update Using the following options solved my problem:

spark.readStream
            .format("kafka")
            .option("kafka.bootstrap.servers", "<my url>:tls port")
            .option("kafka.security.protocol"", "SSL")
            .option("kafka.ssl.keystore.location", "<keystore location>")
            .option("kafka.ssl.keystore.password", "<keystore password>")
            .option("kafka.ssl.key.password", "<keystore password>")
            .option("kafka.ssl.endpoint.identification.algorithm", "")
            .option("subscribe",  "<my topic>")
            .load()
Julia Bel
  • 337
  • 4
  • 18
  • Start with one property at a time. [In the docs, says `kafka.security.protocol`](https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security) – OneCricketeer May 20 '21 at 22:19
  • I've tried, one by one, using the prefix "kafka." and without it. Same issue. – Julia Bel May 21 '21 at 10:20
  • Hi Julia, did you find a fix for this issue? We are experiencing something similar with Databricks and AWS MSK. Obrigado! – solr Sep 24 '21 at 15:58
  • 1
    Hey Solr, I fixed, yes! I've just updated my question, hope I can help you. De nada :) – Julia Bel Sep 27 '21 at 21:59

0 Answers0