I have Spark Streaming application which I am trying to run on a 5 node cluster (including master). I have 2 zookeeper and 3 kafka nodes. I am trying to run the HiBench Streaming Benchmarks as an example app. However, whenever I run a Spark Streaming application I encounter the following error:
java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
at scala.Option.orElse(Option.scala:289)
I have tried removing Spark Streaming checkpointing files as suggested in this similar question. However, the problem persists even if I start a Kafka topic and its corresponding consumer Spark Streaming application for the first time. Also the problem could not be offset related as I start the topic for the first time.