1

I am using a GlobalKTable as a key-value store. The data in the store changes rarely, but it does change.

Once a week I receive a file, and the contents of that file is put onto a stream, in a key-value pair format. This is then stored as a GlobalKTable by the applications that require the data.

My concern is, the file doesn't specify when a key has been removed since the previous file. In this circumstance, I don't want the old, no-longer-valid keys to be hanging around forever in the application stores.

An example:

Week 1:

a=1
b=2
c=3
d=4

Week 2:

a=1
b=2
d=4
e=5

I have compaction enabled on the topic, and a retention period of a couple of weeks.

What would happen to c in all the applications' GlobalKTable stores? Would it expire eventually once the topic removes the data from the stream? Or would it just stay there until the application restarted?

simonalexander2005
  • 4,338
  • 4
  • 48
  • 92

1 Answers1

3

c will stay on a computing node that runs a Kafka Streams client until

  1. on the brokers c is removed due to the retention period (note that the retention period is a lower bound and it might take longer than the retention period until c is actually removed)
  2. the Kafka Streams client stops,
  3. the local state is wiped out, and
  4. a new Kafka Streams client is restarted on the same computing node.

Step 4 will then restore the local state from the topic since it will not find any local and not read c anymore. Key-value stores in Kafka Streams do not have any retention period and cleanups on the topics on the brokers do not trigger any actions on the Kafka Streams client.

If you do not explicitly remove c from the topic by sending a tombstone record c=null to the topic, c will not be explicitly removed in the global table.

Bruno Cadonna
  • 1,348
  • 7
  • 11