I am researching Kafka for a specific use case I am working on. I have a stream of data that is flowing and I want to process it and publish it to intermediary stages.
At each of these stages (initial and intermediary) Samza tasks would do the processing and re publishing. One of the requirements I have is for me to be able to re-trigger the whole processing pipeline from a specific stage in time whenever I want.
I know that kafka maintains an offset for each of its logs (incoming data). However, does Kafka provide any functionality with which I can map partition offsets to some custom identifier (say timestamp) and use this to re-trigger the whole pipeline from that point on wards?
I have read in multiple places that I can replay the kafka commit log by resetting it the beginning and also going back some N times. But is there a way for me to map these offsets to my own identifier like time stamps and use it as a mechanism to tell from which offset to replay.
Best
Shabir