0

I have a topic T with a message expiry retention.ms set for 2 days. The topic has compaction.

If I read that message into a KStream and then further aggregate to a KTable, will the KStream and/or KTable honour that 2 day expiry? When the message is no longer in the topic T, will the message also be removed from the KStream or KTable automatically? Or does some housekeeping process need to tombstone those messages?

simonalexander2005
  • 4,338
  • 4
  • 48
  • 92

1 Answers1

1

delete.retention.ms, the topic's "dirty ratio" (min.cleanable.dirty.ratio), min/max compaction lag, etc are all properties that control how long keys will remain prior to compaction

Yes, the stream/table should be automatically updated, but you may have remnants of data stored elsewhere in changelog topics or state stores since that is stored outside of the original topic

Regarding the first property... (From docs)

gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan).

Therefore, a stream/table with a timed lag less than delete.retention.ms, then you should expect it to be consuming tombstone records, and if it has been running longer than this time, then it'll have data that might have been deleted

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • So how does the KStream/KTable know that the message has expired? Does the topic send a message to say "message X has now expired, remove it from the KStream/KTable"? What if we've mapped it or joined it to something else - would that mapped/joined version then also expire? – simonalexander2005 Aug 25 '21 at 13:41
  • There's no Kafka-message generated (to a topic) outside of the slf4j output from the `LogCleaner` thread. A stream itself maintains no state, so any new consumers will not be able to read the data since its gone from the broker. Tables are stateful, and as mentioned, the backing statestore/changelog topics may keep data without a TTL until they are reset/restarted – OneCricketeer Aug 25 '21 at 14:02