the SS is not able to get all key values in statestore
This is the expected behavior. The data in a "logical" state store in Kafka Streams is actually partitioned (sharded) across the actual instances of the state store across the running instances of your distributed Kafka Streams application (even if you run only 1 application instance, like 1 Docker container for your app). Let me explain below.
A simplified example to illustrate the nature of partitioned state stores: If your application reads from an input topic with 5 partitions, then processing topology of this application would be using 5 stream tasks, and each stream task would get one partition of the "logical" state store (see Kafka Streams Architecture). If you run only 1 application instance (like 1 Docker container) for your application, then this single instance will be executing all 5 stream tasks, but these stream tasks are a shared-nothing setup -- and that means that the data is still partitioned. This is also the case for KTable
s in Kafka Streams, which are also partitioned in this manner.
See also: Is Kafka Stream StateStore global over all instances or just local?
Your example above would only work in the special case where the input topic has only 1 partition, because then there is only 1 stream task, and thus only 1 state store (which would have access to all available keys in the input data).
Trying to access all key values in the defined statestore [...]
Now, if you do want to have access to all available keys in the input data, you have two options (unless you want to go down the route of the special case of an input topic with only 1 partition):
- Option 1: Use global state stores (or
GlobalKTable
) instead of the normal, partitioned state stores. Global state stores can be defined/created via StreamsBuilder#addGlobalStore(...)
, but IIRC you don't need to explicitly add ("attach") global stores to Processors, which you would have to do for normal state stores. Instead, global stores can be accessed by any Processors automatically.
- Option 2: Use the interactive queries feature (aka queryable state) in Kafka Streams.
Note that, in both options, you can access the data in the state store(s) only for reading. You cannot write directly to the state stores in these two situations. If you need to modify the data, then you must update them indirectly through the input topics that are used to populate the stores.