Error while pulling kafka jks certificates from hdfs (trying with s3 as well) in spark

Question

I am Running spark in cluster mode which is giving error as

ERROR SslEngineBuilder: Modification time of key store could not be obtained: hdfs://ip:port/user/hadoop/jks/kafka.client.truststore.jks
java.nio.file.NoSuchFileException: hdfs:/ip:port/user/hadoop/jks/kafka.client.truststore.jks

I ran below command and verified that jks files are present at the location.

hadoop fs -ls hdfs://ip:port/user/hadoop/\<folder1\>

I have written below code to connect to kafka in spark project.

Spark Code:

sparkSession.readStream
.format("kafka")
.option("kafka.bootstrap.servers", )
.option("subscribe", )
...
.option("kafka.ssl.keystore.password", "pswd")
.option("kafka.ssl.key.password", "pswrd"))      
.option("kafka.ssl.truststore.location","hdfs:///node:port/user/hadoop/\<folder1\>/kafka.client.truststore.jks")
.option("kafka.ssl.keystore.location", "hdfs:///node:port/user/hadoop/\<folder1\>/kafka.client.keystore.jks")

Please suggest what is missing?
How to achieve the same with jks file in s3?

OneCricketeer · Accepted Answer · 2023-01-04T22:02:00.937

2

You need to use --files s3a://... (or with hdfs) on your spark-submit option, or use spark.files option in the built session.

Then you can refer to those files by name directly (not with the full path), as they are looked up by relative path to the Spark executor.

For reading from S3, you'll also need to (securely) define your S3 access keys (i.e. not as plaintext in your Spark code). Use an hdfs-site.xml resource file.

edited Jan 04 '23 at 22:02

answered Jan 04 '23 at 21:09

OneCricketeer

179,855
19
132
245

1

Hi @OneCricketeer , Thank you so much! files are accessible now. But not able to connect to kafka broker: Error :-- `WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-x-xx-driver-0] Connection to node -1 could not be established. Broker may not be available. Messages are not getting consumed.` – richa bharwal Jan 06 '23 at 05:25
2

That could just be a generic network issue like firewall, etc denying connection, nothing to do with Spark. SSH to any of your Spark executors and simply run Kafka CLI tools directly to debug – OneCricketeer Jan 06 '23 at 13:40
2

Yes, its working now. Spark executor was not whitelisted. Thanks for your support! @OneCricketeer – richa bharwal Jan 10 '23 at 06:50

Error while pulling kafka jks certificates from hdfs (trying with s3 as well) in spark

1 Answers1