1

We have a series of Kafka Streams applications that perform operations on messages. The apps have state stores.

Client records are fed through the apps and are enriched by using data in the state stores. The state stores are updated from other sources.

There are 2 types of Asynchronous 'entry points' to our system.

Batch: Receive large number of records in batch which are read into the stream to be processed

API: Single records are fed into the stream to be processed and a portfolio is updated which can be queried via API (but only when ready)

Current system

Requirement: We hope to allow Synchronous API calls. I.e. client can send message via API, it will be processed through the stream and returned Synchronously (and obviously fast enough to be API). This can't work with the current system as obviously a single message could get 'stuck' behind a big batch and take a while to be processed.

What we have tried: An idea we hoped to do was to have a fast and a slow queue. Same apps but two deployments of them. We can do this but the problem is that each app deployment would need its own state store meaning we duplicate all our data into multiple state stores even though they are the same topics under the hood (Large amounts of data makes this less sensible and more costly)

Ideally we could have two deployments of the same app using the same state store but we don't think this is possible in Kafka streams?

If you think of a state store as a database table then logically I don't see why two apps should not be able to access the same 'table' but maybe I'm missing something fundamental here.

Ideal scenario

Question: How can we solve 'batch' and 'real time' using the same system? If the answer is.. 'You shouldn't' please give reasons as it seems to me that it makes sense. Thanks for any help!

Jed Arndt
  • 21
  • 4
  • Could you use two different input topic? One for the batch data and one for the API data? For this case, you build a single app with two sub-topologies that both do the same processing (just on a different topic). Both sub-topologies are just connected by the state stores you want to share. (I guess you would need to use the Processor API to share state stores). – Matthias J. Sax Oct 30 '20 at 00:26

0 Answers0