2

I am learning Apache Kafka (as a messaging system) and in that process came to know of term StateStore , link here

I am also aware of Apache kafka streams, the client API.

Is StateStore applicable for Apache kafka in the context of messaging systems or it is applicable to Apache Kafka Streams.

Does Apache have their "own" implementation of StateStore or use third party implementation (for example, rockdsb.

Can anyone help me understand this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
CuriousMind
  • 8,301
  • 22
  • 65
  • 134
  • 2
    You can implement a "cache" using any messaging system capable of storing "persistent state" – OneCricketeer Dec 29 '19 at 03:49
  • @cricket_007: Thanks for your comments. If you could elaborate a bit, would be helpful. – CuriousMind Jan 02 '20 at 12:44
  • 1
    Start a consumer. Put all messages into a Hashmap or add a dependency on a library like Caffeine – OneCricketeer Jan 02 '20 at 21:38
  • 1
    One "nice" thing about Kafka StateStore is that you can extend it to external systems, if needed. For example, https://github.com/andreas-schroeder/redisks – OneCricketeer Jan 03 '20 at 09:27
  • @cricket_007: Thanks for additional details. So `Statestore` is just an interface , and we can have any (sort of SPI) implementing it? By the way, does apache have any native implementation of Statestore, other than using `Rocksdb` – CuriousMind Jan 03 '20 at 14:07

2 Answers2

7

Adding an overview to the good concise explanation about StateStore in the context of Kafka Streams and your question.

Kafka Broker in a nutshell

In a messaging context your work simplified would be:

  1. Publishing state (producing messages)

  2. Saving messages for a period of time for later consumption (retention time)

  3. Consuming state (getting the messages)

And in a nutshell #2 plus fault tolerance and keeping track of the position of your consumer groups' reads (offsets) is what a Kafka broker does for you.

Kafka client API's

Apart from that Kafka provides client libraries for your common patterns of working with messages:

  • Producer - Publish messages to Kafka topics

  • Consumer - Subscribe to Kafka topics

  • Connect - Create reliable integrations with external stores such as various DBMS.

  • Streams - DSL and utilities aimed to simplify development of common streaming application patterns.

  • Admin - Programmatically manage / monitor Kafka resources.

Kafka Streams State Stores

I'll quote the great explanation from the Streams Architecture docs (I highly recommend Kafka docs as they are built very good and for any level of experience).

Kafka Streams provides so-called state stores, which can be used by stream processing applications to store and query data, which is an important capability when implementing stateful operations. The Kafka Streams DSL, for example, automatically creates and manages such state stores when you are calling stateful operators such as join() or aggregate(), or when you are windowing a stream.

As you can see the StateStore is used as a helper for extending the built-in abilities from a single message processing context to multi-message processing, thus enabling more complex functions over a bunch of messages (all the messages passed in a time window, aggregation functions over several messages, etc.)

I'll add to that that RocksDB is the default implementation used by Kafka and can be changed as was mentioned in previous answer.

Also if you want to explore more here is a link to the great intro videos form Apache Kafka's official docs:

Have an awesome learning experience!

matanz
  • 376
  • 1
  • 3
4

StateStore is applicable to kafka streams context.

Some processors like reduce or aggregate are stateful operations. Kafka streams use state stores to manage this. By default, it uses rocksDB, but it is customizable.

  • "By default, it uses rocksDB" - where is the path data is stored by default. I see that it just works fine in my case w/o specifying the path. – RamPrakash Jul 09 '23 at 18:45