I am having a problem to set the required parameters to connect to Kafka from spark using TLS. This is my current approach:
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "<my url>:tls port")
.option("security.protocol", "SSL")
.option("ssl.key.password", "<key password>")
.option("ssl.keystore.location", "<keystore location>")
.option("ssl.keystore.password", "<keystore password>")
.option("ssl.endpoint.identification.algorithm", "")
.option("subscribe", "<my topic>")
.load()
I've also tried to use the prefix kafka.
and include the configurations in my spark submit (using --conf
or including the .jks
file location in --files
). It also doesn't solve the problem if I use spark.read
instead of spark.readStream
.
The problem so can be represented in the logs, where the parameters I am setting are still null or continue having the default value. Also, the connection fails like when I was trying to connect without using TLS certificate (that is required for my current kafka):
{"Application":"My test application" ,"level": "INFO ", "timestamp": "2021-05-20 15:33:07,485", "classname": "org.apache.kafka.clients.consumer.ConsumerConfig", "body": "ConsumerConfig values:
[...]
security.protocol = PLAINTEXT
[...]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
[...]
ssl.keystore.location = null
ssl.keystore.password = null
[...]
{"Application":"My test application" ,"level": "WARN ", "timestamp": "2021-05-20 15:33:08,669", "classname": "org.apache.kafka.clients.NetworkClient", "body": "[Consumer clientId=consumer-sample_table-1, groupId=sample_table] Bootstrap broker <my ip> (id: -1 rack: null) disconnected"}
Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
Currently, I am using spark 3.0.0
and scala 2.12
. Also, I am submiting the job using the following command:
$SPARK_HOME/spark-submit --name "My application" \
--master yarn \
--deploy-mode client \
--class <main class> \
application.jar
Did anyone have a similar problem? Thank you.
Update Using the following options solved my problem:
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "<my url>:tls port")
.option("kafka.security.protocol"", "SSL")
.option("kafka.ssl.keystore.location", "<keystore location>")
.option("kafka.ssl.keystore.password", "<keystore password>")
.option("kafka.ssl.key.password", "<keystore password>")
.option("kafka.ssl.endpoint.identification.algorithm", "")
.option("subscribe", "<my topic>")
.load()