What is the best solution to let a consumer always read the last, most recent message of a key? Is there a solution in kafka or do I need kafka streams for it?
An Example:
The messages m with the keys K1-K3 are in a log:
|K1 m0 | K2 m1 | K3 m2 | K2 m3 | K1 m4 | K2 m5 | K1 m6 | K3 m7 | K1 m8| ...
---------------------------------------------------------------------------------> t
t1 t2
Two consumers read the messages with key K1. Consumer 1 starts at t1 and consumer 2 starts at t2. I want consumer2 to read from m6 on.
consumer 1: m0, m4, m6, m8, ... consumer 2: m6, m8, ...
My two approaches:
- Use auto.offset.reset='latest'. There are two problems with this approach. The first one is that there is usually an initial offset in kafka. So auto.offset.reset method is not called. If I set enable_auto_commit=False in addition, consumer 2 would start with m8 and not with m6.
- Start consumer 2 with a specific offset. Here I don't know where to get the correct end offset from, especially taking into account the keys. For this example I need the offset of "K1 m6".