1

What is the best solution to let a consumer always read the last, most recent message of a key? Is there a solution in kafka or do I need kafka streams for it?

An Example:

The messages m with the keys K1-K3 are in a log:

                
    |K1 m0 | K2 m1 | K3 m2 | K2 m3 | K1 m4 | K2 m5 | K1 m6 | K3 m7 | K1 m8| ...
---------------------------------------------------------------------------------> t
 t1                                                             t2

Two consumers read the messages with key K1. Consumer 1 starts at t1 and consumer 2 starts at t2. I want consumer2 to read from m6 on.

consumer 1: m0, m4, m6, m8, ... consumer 2: m6, m8, ...

My two approaches:

  1. Use auto.offset.reset='latest'. There are two problems with this approach. The first one is that there is usually an initial offset in kafka. So auto.offset.reset method is not called. If I set enable_auto_commit=False in addition, consumer 2 would start with m8 and not with m6.
  2. Start consumer 2 with a specific offset. Here I don't know where to get the correct end offset from, especially taking into account the keys. For this example I need the offset of "K1 m6".
synpho
  • 11
  • 1

1 Answers1

0

The proper solution would indeed be read the entire topic into a KTable, then joining against that table or using Interactive Queries for the respective keys.
If you have these keys in only one partition, you cannot have two consumers of the same group pointing at different records.

Setting auto.offset.reset is only tracked for the group.id, not the "initial offset" of the topic itself, and cannot be set for any specific key(s)

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245