2

I am using spark streaming with the Kafka integration, When i run the streaming application from my IDE in Local mode, everything works as a charm. However as soon as i submit it to the cluster i keep having the following error:

java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer

I am using sbt assembly to build my project.

my sbt is as such:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
  "org.marc4j" % "marc4j" % "2.8.2",
  "net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)


run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated


mainClass in assembly := Some("EstimatorStreamingApp")

I also tried to use the --package option

attempt 1

--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0

attempt 2

--packages org.apache.spark:spark-streaming-kafka-0-10-assembly_2.11:2.2.0

All with no success. Does anyone has anything to suggest

Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
MaatDeamon
  • 9,532
  • 9
  • 60
  • 127

1 Answers1

4

You need to remove the "provided" flag from the Kafka dependency, as it is a dependency not provided OOTB with Spark:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
  "org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
  "org.marc4j" % "marc4j" % "2.8.2",
  "net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321