I am working on a Kafka Streams application built with Spring Cloud Stream. In this application I need to:
- Consume a continuous stream of messages that can be retrieved at a later time.
- Persist a list of the message IDs matching some criteria.
- In a separate thread, run a scheduler which reads out the message IDs at a regular interval, retrieve the corresponding messages that match those IDs, and perform an action with those messages.
- Remove the processed message IDs from the list so that work is not duplicated.
I have considered implementing this as follows:
- Consume the incoming stream of messages as a materialized KTable so that I can look up and retrieve messages by key at a later time.
- Materialize the list of message IDs in another state store.
- Use Spring's scheduling mechanism to run a separate thread which reads from the state store via the
InteractiveQueryService
bean.
The problem I hit is that the InteractiveQueryService
provides read-only access to the state store, so I cannot remove entries in the other thread. I have decided not to use Kafka Stream's punctuate capability since the semantics are different; my scheduling thread must always run at a regular interval, irrespective of the processing of the inbound messages.
Another alternative might be to use the low-level Processor API, and pass a reference to the writable state store to my scheduler thread. I will need to synchronize on write operations. But I'm not sure if this is do-able or if there are other constraints when accessing the state store like this from a separate thread.
Any input or advice would be appreciated!