I have problem with using updateStateByKey() function. I have following, simple code (written base on book: "Learning Spark - Lighting Fast Data Analysis"):
object hello {
def updateStateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = {
Some(runningCount.getOrElse(0) + newValues.size)
}
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[5]").setAppName("AndrzejApp")
val ssc = new StreamingContext(conf, Seconds(4))
ssc.checkpoint("/")
val lines7 = ssc.socketTextStream("localhost", 9997)
val keyValueLine7 = lines7.map(line => (line.split(" ")(0), line.split(" ")(1).toInt))
val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)
ssc.start()
ssc.awaitTermination()
}
}
My build.sbt is:
name := "stream-correlator-spark"
version := "1.0"
scalaVersion := "2.11.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided"
)
When I build it with sbt assembly
command everything goes fine. When I run this on spark cluster in local mode I got error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/dstream/DStream$ at hello$.main(helo.scala:25) ...
line 25 is:
val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)
I feel this might be some compatibility version problem but I don't know what might be the reason and how to resolve this.
I would be really grateful for help!