Why does Spark Streaming application with Kafka fail with "requirement failed: numRecords must not be negative"?

Question

I have Spark Streaming application which I am trying to run on a 5 node cluster (including master). I have 2 zookeeper and 3 kafka nodes. I am trying to run the HiBench Streaming Benchmarks as an example app. However, whenever I run a Spark Streaming application I encounter the following error:

java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
        at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)

I have tried removing Spark Streaming checkpointing files as suggested in this similar question. However, the problem persists even if I start a Kafka topic and its corresponding consumer Spark Streaming application for the first time. Also the problem could not be offset related as I start the topic for the first time.

What Spark version is this? Kafka? Can you check the number of records and the partitions of the topics you read from? It _appears_ that the offsets give you negative number..._somehow_. Does the exception happen right after you start your app (when your offsets are initialized if any). — Jacek Laskowski, Dec 25 '16 at 09:26
Spark is 2.0.2, Kafka is 0.8.2 Yes it happens immediately after launching the Spark app. I have scheduled the benchmark to create 3 Kafka partitions (equal to kafka brokers). I am not exactly sure what you mean be number of records, but there is one kafka producer thread producing records at a rate of 20k records per second. — jaywalker, Dec 26 '16 at 02:08
When this error occurred you can follow this method. 1. Close the Spark app. 2. clean the checkpoint dir. 3. Close the Kafka. and then 4. Start the Kafka 5. Run the Spark app. I hope it helps. — Vezir, Dec 27 '16 at 07:45
@Vezir, I mentioned in the last paragraph of the OP, this solution does not seem to work for me. — jaywalker, Dec 28 '16 at 07:05
@HaseebJaved, You mentioned 'starting a Kafka topic ... for the first time'. I am recommending restarting Kafka. Thats not the same. — Vezir, Dec 28 '16 at 07:17
I have the same error message, but I am just a consumer consuming records from a Kafka Topic produced and managed by a different server. My offset to fetch are well within the range available on the Queue. — shripal mehta, May 18 '22 at 08:32

Why does Spark Streaming application with Kafka fail with "requirement failed: numRecords must not be negative"?

0 Answers0