How to store message offset in Kafka if i am using KafkaUtils.createDirectStream to read the messages. Kafka is losing the offset value every time the application goes down.It is then reading the value provided in auto.offset.reset(which is latest) and fails to read messages in the stop-start interval of the application.
Asked
Active
Viewed 305 times
1 Answers
1
You can avoid that by manually committing the offset. Set enable.auto.commit as false and then use below code to commit the offset in kafka after successful operation.
var offsetRanges = Array[OffsetRange]()
val valueStream = stream.transform {
rdd =>
offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
}.map(_.value())
//operation
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
You can also read this doc which will give you good understanding of offset management https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka-with-apache-spark-streaming/

Rishi Saraf
- 1,644
- 2
- 14
- 27
-
Thanks for your response Rishi. I implemented the above in my code, but i am getting Caused by: java.io.NotSerializableException: Object of org.apache.spark.streaming.kafka010.DirectKafkaInputDStream is being serialized possibly as a part of closure of an RDD operation. This is because the DStream object is being referred to from within the closure. Please rewrite the RDD operation inside this DStream to avoid this. This has been enforced to avoid bloating of Spark tasks with unnecessary objects. – user1326784 Jan 26 '19 at 22:52
-
Can i use CommitSync in my case if i am using KafkaUtils.createDirectStream to read the messages – user1326784 Feb 06 '19 at 22:42