1

I use Kafka to join two streams with 3 days join window:

    ...
 private final long retentionHours = Duration.ofDays(3);

    ...
    var joinWindow = JoinWindows.of(Duration.ofMinutes(retentionHours))
                                .grace(Duration.ofMillis(0));
    var joinStores = StreamJoined.with(Serdes.String(), aggregatorSerde, aggregatorSerde)
                                 .withStoreName("STORE-1")
                                 .withName("STORE-2");
    stream1.join(stream2, streamJoiner(), joinWindow, joinStores);

With above implementation, I found that Kafka created state folder: /tmp/kafka-streams, (looks like RocksDB) and it grows constantly. Also, state store in Kafka cluster grows constantly.

So, I changed streams join implementation to:

...
private final long retentionHours = Duration.ofDays(3);
    ...
    var joinWindow = JoinWindows.of(Duration.ofMinutes(retentionHours))
                                .grace(Duration.ofMillis(0));
    var joinStores = StreamJoined.with(Serdes.String(), aggregatorSerde, aggregatorSerde)
                                 .withStoreName("STORE-1")
                                 .withName("STORE-2")
                                 .withThisStoreSupplier(createStoreSupplier("MEM-STORE-1"))
                                 .withOtherStoreSupplier(createStoreSupplier("MEM-STORE-2"));
    stream1.join(stream2, streamJoiner(), joinWindow, joinStores);

...
private WindowBytesStoreSupplier createStoreSupplier(String storeName) {
    var window = Duration.ofMinutes(retentionHours * 2)
                         .toMillis();
    return new InMemoryWindowBytesStoreSupplier(storeName, window, window, true);
}

Now, there is no state folder: /tmp/kafka-streams.

Does it mean that InMemoryWindowBytesStoreSupplier doesn't use disk at all? If yes, how does it work?

Also, I still see that state store in Kafka cluster grows constantly.

1 Answers1

1

Does it mean that InMemoryWindowBytesStoreSupplier doesn't use disk at all? If yes, how does it work?

IIRC, InMemoryWindowBytesStore doesn't use disk at all.

Generally speaking, a logical state store is in fact partitioned into multiple state store 'instances' (think: each stream task has its own, local state store instance). For the InMemoryWindowBytesStore specifically, and by design, these store instances manage all their local data in memory.

Also, I still see that state store in Kafka cluster grows constantly.

However, the InMemoryWindowBytesStore is still fault-tolerant. This is often confusing for new Kafka Streams developers because, in most software, "in memory" always implies "data is lost if something happens". This is not the case with Kafka Streams, however. A state store is always 'backed up' durably to its Kafka changelog topic, regardless of whether you use the default state store (with RocksDB) or the in-memory state store. This explains why you see the in-memory state's (changelog) data in the Kafka cluster. The data should not grow forever, btw, as changelog topics are compacted to prevent exactly this scenario.

Note: What can happen, however, when using the in-memory store is that your application instances could run out of memory (OOM), and thus crash. While your state data will never be lost, as explained above, your application will not be running due to the OOM crash / it will run only partially (some app instances run OOM, others do not). This OOM problem doesn't apply to the default store (RocksDB), as it manages its data on disk, and uses memory (RAM) only for caching purposes. But, again, this question of app availability is orthogonal to data safety (your data is safe regardless of whether your app is crashing or not).

miguno
  • 14,498
  • 3
  • 47
  • 63
  • Michael, one more clarification. Is it possible to set a limit for InMemory or RocksDB storage? So, if the limit is reached Kafka will drop data from RAM to DISK and try to load them from State store if needed. – Viktor Kurchenko Aug 03 '20 at 19:53
  • No, there is no such setting. InMemory will *never* write to disk, and the default store (with RocksDB) is *always* writing to disk. – miguno Aug 04 '20 at 07:34
  • You can of course write your own state store implementation (it's not that hard) to get this behavior. Here's an example on how to implement your own, in this case a Count Min Sketch backend for the state store to allow for probabilistic counting: https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/scala/io/confluent/examples/streams/ProbabilisticCountingScalaIntegrationTest.scala – miguno Aug 04 '20 at 07:36
  • 1
    Michael, thank you! Your replies were very useful! But I'm going to fix this with a different approach. Instead of using native Kafka join, I'll create 2 state stores (for 1 and 2 streams) and implement join logic manually. So, with such an approach I can remove events from state stores immediately after join. – Viktor Kurchenko Aug 06 '20 at 07:29
  • Happy to hear it was helpful. If you want to build a custom join, you may want to take a look at the custom join example I wrote a few months back: https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/CustomStreamTableJoinIntegrationTest.java – miguno Aug 06 '20 at 10:28
  • You can also disable fault-tolerance if you want, but I would not recommend it for obvious reasons. -- The other thing if, if you want to limit the memory, you can also make the join window smaller -- this was, fewer records need to buffered in the stores and thus you use less memory. – Matthias J. Sax Aug 07 '20 at 22:32
  • Matthias thanks for the reply! Yeah, I thought about smaller join window, but it's not my case. Looks like the ideal solution will be just to remove records from state stores immediately after join. – Viktor Kurchenko Aug 08 '20 at 22:03