1

I have a spark structured streaming job in scala, reading from kafka and writing to S3 as hudi tables. Now I am trying to move this job to spark operator on EKS.

When I give the option in the yaml file.

spark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1

But still I get the error at both the driver and executor

java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaBatchInputPartition .

How to add the package org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2, so it works.

Edit: Seems it is an existing issue fixed only in yet to be released version spark 3.4. Based on the suggestions here and here I had to bake all the jars (spark-sql-kafka-0-10_2.12-3.1.2 and its dependencies and also hudi jar) into the spark image. Then it worked.

0 Answers0