I'm writing a Spark Streaming application reading from Kafka. In order to have an exactly one semantic, I'd like to use the direct Kafka stream and using Spark Streaming native checkpointing.
The problem is that checkpointing makes pratically impossible to mantain the code: if you change something you loose the checkpointed data, thus you are almost compelled to read twice some messages from Kafka. And I'd like to avoid it.
Thus, I was trying to read the data in the checkpointing directory by myself, but so far I haven't been able to do that. Can someone tell me how to read the information about last processed Kafka offsets by the checkpointing folder?
Thank you, Marco