0

I am researching Kafka for a specific use case I am working on. I have a stream of data that is flowing and I want to process it and publish it to intermediary stages.

At each of these stages (initial and intermediary) Samza tasks would do the processing and re publishing. One of the requirements I have is for me to be able to re-trigger the whole processing pipeline from a specific stage in time whenever I want.

I know that kafka maintains an offset for each of its logs (incoming data). However, does Kafka provide any functionality with which I can map partition offsets to some custom identifier (say timestamp) and use this to re-trigger the whole pipeline from that point on wards?

I have read in multiple places that I can replay the kafka commit log by resetting it the beginning and also going back some N times. But is there a way for me to map these offsets to my own identifier like time stamps and use it as a mechanism to tell from which offset to replay.

Best
Shabir

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Shabirmean
  • 2,341
  • 4
  • 21
  • 34

1 Answers1

2

you can use commandline tool kafka-consumer-groups to reset offset for consumer group based on timestamp (--to-datetime). See more on the doc page: https://kafka.apache.org/documentation/#basic_ops_consumer_group

The same, of course, can be achieved through the code.

Natalia
  • 4,362
  • 24
  • 25
  • Thank you. This is sort of what I was expecting. Is there some Kafka SDK that lets us directly access this information from code and re-configure the offset? – Shabirmean Feb 25 '20 at 03:58
  • Also can I tag offsets with my own field so when I say run from this vlaue of this field it will pick the correct offset? – Shabirmean Feb 25 '20 at 05:31
  • https://medium.com/@werneckpaiva/how-to-seek-kafka-consumer-offsets-by-timestamp-de351ba35c61 check this post how to do this in the code – Natalia Feb 25 '20 at 11:53
  • as far as I know, Kafka doesn't allow you to put any 'tags' on offsets. You are free to implement that on your own using any DB (something like table: topic, partition, offset -> tag) – Natalia Feb 25 '20 at 11:54