How to access all statestore data in Kstream DSL .transform() method

Question

Trying to access all key values in the defined statestore but, in .transform() method i can access only with one key (which is the source key)

KeyValueStore<String, String> SS=context.getStateStore("macs");

the SS is not able to get all key values in statestore

SS.get("key1");
SS.get("key2");
SS.get("key3");
SS.get("key4");

only 1 out of 4 returns values rest all returns null

are you running application on a single instance or multiple? take into account that state store only returns data locally available on that particular instance. for receiving key-value pairs from state store you need to invoke `keyValueStore.all()` and iterate over it. — Vasyl Sarzhynskyi, Apr 01 '19 at 19:17
Yes, i'm aware of multi instance SS. I'm running only on 1 single instance — Santosh Rachakonda, Apr 01 '19 at 21:51
tried keyValueStore.all() and tried to get with specific key SS.get("key"), it returns null — Santosh Rachakonda, Apr 01 '19 at 21:52
strange... please debug whether you put data into state store with different keys. also please post your implementation of `transform()` — Vasyl Sarzhynskyi, Apr 02 '19 at 05:56
It does not depend on instances, but on number of input topic partitions. — Matthias J. Sax, Apr 15 '19 at 03:01

score 4 · Answer 1 · answered Apr 02 '19 at 06:55

the SS is not able to get all key values in statestore

This is the expected behavior. The data in a "logical" state store in Kafka Streams is actually partitioned (sharded) across the actual instances of the state store across the running instances of your distributed Kafka Streams application (even if you run only 1 application instance, like 1 Docker container for your app). Let me explain below.

A simplified example to illustrate the nature of partitioned state stores: If your application reads from an input topic with 5 partitions, then processing topology of this application would be using 5 stream tasks, and each stream task would get one partition of the "logical" state store (see Kafka Streams Architecture). If you run only 1 application instance (like 1 Docker container) for your application, then this single instance will be executing all 5 stream tasks, but these stream tasks are a shared-nothing setup -- and that means that the data is still partitioned. This is also the case for KTables in Kafka Streams, which are also partitioned in this manner.

See also: Is Kafka Stream StateStore global over all instances or just local?

Your example above would only work in the special case where the input topic has only 1 partition, because then there is only 1 stream task, and thus only 1 state store (which would have access to all available keys in the input data).

Trying to access all key values in the defined statestore [...]

Now, if you do want to have access to all available keys in the input data, you have two options (unless you want to go down the route of the special case of an input topic with only 1 partition):

Option 1: Use global state stores (or GlobalKTable) instead of the normal, partitioned state stores. Global state stores can be defined/created via StreamsBuilder#addGlobalStore(...), but IIRC you don't need to explicitly add ("attach") global stores to Processors, which you would have to do for normal state stores. Instead, global stores can be accessed by any Processors automatically.
Option 2: Use the interactive queries feature (aka queryable state) in Kafka Streams.

Note that, in both options, you can access the data in the state store(s) only for reading. You cannot write directly to the state stores in these two situations. If you need to modify the data, then you must update them indirectly through the input topics that are used to populate the stores.

can i use globalStatestore as like statestore for reading and writing ? can you please share any sample reference ? how to add globalstatestore ? — Santosh Rachakonda, Apr 03 '19 at 20:00
As I said above, you cannot write to a global state store directly. To update your data in a global state store, you must write data (updates) to the topic that is backing the global state store. — miguno, Apr 04 '19 at 15:34
Some example usage is available at https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/TopologyTest.java. — miguno, Apr 04 '19 at 15:37

How to access all statestore data in Kstream DSL .transform() method

1 Answers1