I am performing aggregation using kafkaStreams which actually keeps all my aggregated records into a keyValue state store against a specific key which i am generating to uniquely identify that aggregation. I am not using any kafka window for this aggregation. so essentially this method will keep of listening to input data and thus keep on aggregating. Now based on the key, i need to apply different logic to search from the stateStore and move my data downstream.
Kafka's KeyValueStateStore gives me 4 methods viz, all, prefixScan, range and get. Of which based on the key i am generating, i find i can only use all and get.
- if i use get, kafka will internally scan over the complete statestore and give me data for the specific key, so if i have a list of keys, it will iterate over complete statestore for the number of keys in the list.
- If I manage to create a regex for my search key, i can use all() and iterate over all data in statestore in a java logic and search for my regex and move downstream. but again it will be a manual iteration over the complete statestore.
P.S. at any point in time my statestore will contain at least a billion records.
Can someone please suggest the best (performance wise) possible way to retrieve data using a key search into kafka keyValue stateStore. or any alternative to the approach is appreciated.
Update: After eviction of data from statestore, i am not deleting it but wish to update it with a flag stating evicted or not. Which can only be possible by having a read/write access to the statestore which is again only available through pipeline as interactive queries give only a read access to the statestore. This is what my knowledge of Kafka limits to. Please help if otherwise.