3

I understand that the default TTL is set to infinity (non-positive). However, if we need to retain data in the store for max of 2 days, can we do the override with the RocksDBConfigSetter interface implementation, that is options.setWalTtlSeconds(172800)? OR would it conflict with the Kafka streams internals?

Ref: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config

Raman
  • 665
  • 1
  • 15
  • 38
  • Possible duplicate of [Kafka Streams - Low-Level Processor API - RocksDB TimeToLive(TTL)](https://stackoverflow.com/questions/43860114/kafka-streams-low-level-processor-api-rocksdb-timetolivettl) – Jacek Laskowski Aug 01 '19 at 18:24

1 Answers1

2

This is currently not possible. Kafka Streams disables RocksDB's TTL feature in a hard-coded way for various technical reasons. There is also a ticket for this: https://issues.apache.org/jira/browse/KAFKA-4212

For now, you could use a windowed store to expire old record after 2 days. Ie, you do a stream.groupByKey().windowedBy(...).reduce(...) with a TimeWindow of 1ms and a "dummy" reduce that just return the latest value for a key.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Hi Matthias, thanks. Since the store is enriched through low level processor API, please suggest if there is a workaround similar to the above high level DSL. If not, I understand that there is a open ticket, I plan to iterate through the key value store in a specific frequency to delete them – Raman Jun 28 '18 at 02:20
  • You can also create a windowed store instead of a plain key-value store via Processor API. – Matthias J. Sax Jun 28 '18 at 03:43
  • Hi Matthias, I'm facing the similar problems. I have `windowedBy()` function in my topology with 1 hour time window. But when I check the disc usage, the `/tmp/kafka-streams` folder keeps increasing. From 5G to 20G in 24 hours. How often will rocksdb remove the old data? Thanks! My code is like `.windowedBy(TimeWindows.of(Duration.ofMinutes(60)).grace(Duration.ZERO)) .reduce((event1, event2) -> event2)` – thinktwice Sep 19 '19 at 20:18
  • By default, retention time is 24h. You can configure it via `reduce(..., Materialized.as(null).withRetention(...))`. – Matthias J. Sax Sep 19 '19 at 22:41