I use Kafka to join two streams with 3 days join window:
...
private final long retentionHours = Duration.ofDays(3);
...
var joinWindow = JoinWindows.of(Duration.ofMinutes(retentionHours))
.grace(Duration.ofMillis(0));
var joinStores = StreamJoined.with(Serdes.String(), aggregatorSerde, aggregatorSerde)
.withStoreName("STORE-1")
.withName("STORE-2");
stream1.join(stream2, streamJoiner(), joinWindow, joinStores);
With above implementation, I found that Kafka created state folder: /tmp/kafka-streams, (looks like RocksDB) and it grows constantly. Also, state store in Kafka cluster grows constantly.
So, I changed streams join implementation to:
...
private final long retentionHours = Duration.ofDays(3);
...
var joinWindow = JoinWindows.of(Duration.ofMinutes(retentionHours))
.grace(Duration.ofMillis(0));
var joinStores = StreamJoined.with(Serdes.String(), aggregatorSerde, aggregatorSerde)
.withStoreName("STORE-1")
.withName("STORE-2")
.withThisStoreSupplier(createStoreSupplier("MEM-STORE-1"))
.withOtherStoreSupplier(createStoreSupplier("MEM-STORE-2"));
stream1.join(stream2, streamJoiner(), joinWindow, joinStores);
...
private WindowBytesStoreSupplier createStoreSupplier(String storeName) {
var window = Duration.ofMinutes(retentionHours * 2)
.toMillis();
return new InMemoryWindowBytesStoreSupplier(storeName, window, window, true);
}
Now, there is no state folder: /tmp/kafka-streams.
Does it mean that InMemoryWindowBytesStoreSupplier doesn't use disk at all? If yes, how does it work?
Also, I still see that state store in Kafka cluster grows constantly.