I am trying to understand RocksDB behavior in Kafka streams processor API. I am configuring a persistent StateStore using the default RocksDB that KStreams provide.
StoreBuilder countStoreBuilder =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("Counts"),
Serdes.String(),
Serdes.Long())
I am not doing any aggregation, join, or windowing. I am just receiving records and comparing some of them to previous items in the store and storing some of the records I receive in the state store.
The developer guide mentions that you can enable record caches in the Processor API by calling .withCachingEnabled()
on the above builder.
The cache "serves as a read cache to speed up reading data from a state store" - Record Caches Kafka Streams
However, my understanding is that RocksDB in persistent mode is first buffered in memory and will expand into disk only if the state doesn't fit in RAM.
RocksDB is just used as an internal lookup table (that is able to flush to disk if the state does not fit into memory RocksDB flushing is only required because state could be larger than available main-memory. Kafka Streams Internal Data Management
So how does record caches speed up the read from the state store if both are buffered in memory? It seems to me that record caches overlap with RocksDB behavior.