How can Kafka Streams be used for Event sourcing?

Question

I read about how event sourcing can be achieved by using Apache Kafka as the event broker. (Link to the confluent article)

If we take a look at this picture, it shows how event is written into Kafka, and then Kafka Streams is used to create views in the database. My question here is how can we use Kafka Streams for this? If i'm correct it is a client library, so we need something that uses this, like a microservice called "Aggregate Service". Is this the right approach to implement such design? Would it scale well?

Is there a reason you're writing to an event queue rather than just writing to a kafka topic? (Do you need the outbox pattern here?) — Oliver McPhee, Aug 04 '22 at 15:09

score 0 · Answer 1 · answered Aug 01 '22 at 13:27

Kafka Streams must first consume events from Kafka that have been "sourced" by some other process using a plain Kafka producer library.

Kafka Streams applications can only scale up to the number of partitions in their source topics as they're built on the base Kafka consumer API

score 0 · Answer 2 · answered Aug 01 '22 at 18:24

In that diagram, Kafka Streams is being used as a projection from the event store (the write-model for this application) to a read-model (a view of the data that's more optimized for performing queries).

The write side of the application could well be a service that receives commands and writes to an event store (which could be a DB purposely designed for this like EventStore, or some other datastore being utilized with such patterns as it satisfies the contract for an event store). The broad contract for an event store is that it allows appending an event for some entity and provides a means to retrieve all events for a given entity after some point (often "the beginning of time", though it's also not uncommon to have some snapshot store, in which case that point is derived from the latest snapshot).

Kafka is usable as an event store, especially if there are fairly few entities being event-sourced relative to the number of partitions: otherwise the "retrieve all events for a given entity" operation implies filtering out events for other entities, which at some point becomes prohibitively inefficient.

If not using Kafka as the event store but using Kafka Streams as a projection, then you'd likely have one of:

(high-level, e.g. using something like Akka Persistence to manage the event store; disclaimer: I am employed by Lightbend which maintains Akka and provides commercial support and consulting around Akka) a projection from the event store publishing events to a Kafka topic to which Kafka Streams subscribes
(low-level, e.g. a hand-rolled library for treating a regular DB as an event store) change-data-capture (e.g. Debezium for MySQL/Postgres/etc.) publishing updates to the event store tables to a Kafka topic to which Kafka Streams subscribes

How can Kafka Streams be used for Event sourcing?

2 Answers2