5

The docs are unclear. When would I want to set retain duplicates to false/true. What is this used for? Is it for something particular in RocksDB?

https://kafka.apache.org/21/javadoc/org/apache/kafka/streams/state/Stores.html#persistentWindowStore-java.lang.String-java.time.Duration-java.time.Duration-boolean-

Digging through streams internal code seems to being used to set some sequence number?

RocksDBWindowStore.java

private void maybeUpdateSeqnumForDups() {
    if (this.retainDuplicates) {
      this.seqnum = this.seqnum + 1 & 2147483647;
    }
Chris
  • 1,299
  • 3
  • 18
  • 34
  • Yes. If `retainDuplicates` is set, then the key is replaced by the pair `(key, seqnum)` before data is stored in the state store. See here: https://github.com/apache/kafka/blob/0667fe2bfda95b756faa589896b7da52622ef871/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryWindowStore.java#L260. It might be used to keep all updates for an entity in the store. – user152468 Apr 10 '19 at 06:55

1 Answers1

3

Well, as the name indicates, you can enable storing duplicates if you want to store multiple rows, with the same key. For window stores, the key is comprise of record key and window start timestamp.

Kafka Streams uses this feature for KStream-KStream joins. For this case, each input record is stored in its own window in the store (using the record timestamp as window start timestamp). Because there might be multiple records with the same key and same timestamp, it's required to enable this flag to compute the correct join. Otherwise, the join result might be incomplete.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Do the duplicate keys need to be in different windows? The key value store does not have this option so trying to understand the difference. – Chris Apr 18 '19 at 04:08
  • No, the keys can be in the same window. And yes, it's a feature of the window-store only. – Matthias J. Sax Apr 18 '19 at 21:49