1

I am new to Kafka and have tried to create a small Kafka KTable implementation. I have successfully added a KTable and was able to query. I have used local state store and it worked as expected. Below is my Local State Store Config

    @Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public KafkaStreamsConfiguration kafkaConfiguration(final KafkaProperties kafkaProperties) {
    Map<String, Object> config = new HashMap<>();
    config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
    config.put(StreamsConfig.APPLICATION_ID_CONFIG, kafkaProperties.getClientId());
    config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
    config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, MessageSerdes.class.getName());
    config.put(StreamsConfig.STATE_DIR_CONFIG, directory);
    //TODO : verify error strategy
    config.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);
    return new KafkaStreamsConfiguration(config);
}

Now I want to use Global State using RPC . I am confused with few questions. To add a Global State Store I need to add RPC endpoint

config.put(StreamsConfig.APPLICATION_SERVER_CONFIG, "127.0.0.1:8080");

The documentation says

"The only requirements are that the RPC layer is embedded within the Kafka Streams application "

  • Does this mean we need to create a client endpoint within Kafka application , if so , if its a Spring Boot application with web dependency is it like "localhost:8080"
  • How will other instances of this application will connect only via APPLICATION_SERVER_CONFIG (application.server) and perform interactive queries or keep the state at sync. I mean How to provide additional configuration for other instances of same application to create a sync in Global state .
  • If Global state is created Do we need to keep a backup at Mongodb or other place for whatsoever reason. (Fault tolerance) Considering DB will never be as fast as writing to disk , Do we even care for it or should rely on distributed architecture

It would be great if some Kafka Global State Store implementation with example is given.

Kumar Pallav
  • 590
  • 1
  • 6
  • 16
  • State is always "local" -- the difference between a `KTable` and a `GlobalKTable` is, if the state is partitioned/sharded (for `KTables`) or broadcasted/replicated (for `GlobalKtables`) over all you application instances. For both cases, the data is replicated in the Kafka cluster and thus you don't need to back up the stores -- they are fault-tolerant out-of-the-box. – Matthias J. Sax Mar 21 '20 at 23:58

1 Answers1

3

First of all, this is not Global State, if you want to use Global state, you should build GlobalKtable instead of KTable. When you materialize your KTable to a state store, your state store gets partitioned and these partitions get distributed across your application instances, and each instance can only query it state store hence the name local state. You can access to your other instances' store by adding a RPC layer to each of your application instances.

  1. Do you mean server endpoint? Yes.
  2. Kafka docs state that Kafka Streams will keep track of the RPC endpoint information for every instance of an application, its state stores, and assigned stream partitions through instances of StreamsMetadata.

Using the StreamsMetadata instance you can get HostStoreInfo of the application instance which have the partition containing the key you want to query.

  1. In your case (which you are using KTable), it's local state, it is backed by an internal Kafka changelog topic which enable log compaction, so your local state is fault tolerance, your local state get restored using this changelog topic during startup, this topic has format:
<application.id>-<your-local-state-store-name>-changelog

You can view an example of how you can query remote state store for entire app here.

Tuyen Luong
  • 1,316
  • 8
  • 17