4

I have a compact topic with approx 30Mio Keys. My App materializes this topic to a KeyValueStore.

How can I check if the KeyValueStore is completely populated? If I lookup a key via InteractiveQuery I need to know if the key is not present because the StateStore is not ready yet or if the key is indeed not present.

I materialize the StateStore this way:


  @Bean
  public Consumer<KTable<Key, Value>> process() {
    return stream -> stream.filter((k, v) -> v != null,
        Materialized.<Key, Value, KeyValueStore<Bytes, byte[]>>as("stateStore")
            .withKeySerde(new KeySerde())
            .withValueSerde(new ValueSerde()));
  }
Frank
  • 129
  • 1
  • 6

2 Answers2

4

In general, there is no such thing as "fully loaded" because after the application was started at any point in time new data might be written to the input topic and this new data would be read to update the corresponding table.

What you can do is to monitor consumer lag: within you application KafkaStreams#metrics() allow you to access all client (ie, consumer/producer) and Kafka Streams metrics. The consumer exposes a metric called records-lag-max that may help.

Of course, during normal processing (assuming that new data is written to the input topic all the time) consumer lag will go up-and-down all the time.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
-1

Updated : I misunderstood OP's question from "how to check if the Topology has finished materialized the input topic to state store" to "state store restore process"

You can only get KeyValueStore from your KafkaStreams instance when the KafkaStreams' state has changed from REBALANCING to RUNNING state. You can check this this state transition using a StreamsBuilderFactoryBeanCustomizer to access the underlying KafkaStreams instance. If you just want to check when all state store have been fully populated and when kafka stream thread is ready so you can get a KeyValueStore the you can listen on StateListener:

@Bean
public StreamsBuilderFactoryBeanCustomizer onKafkaStateChangeFromRebalanceToRunning() {
    return factoryBean -> factoryBean.setStateListener((newState, oldState) -> {
        if (newState == KafkaStreams.State.RUNNING && oldState == KafkaStreams.State.REBALANCING) {
            // set flag that `stateStore` store of current KafkaStreams has been fully restore
            // then you can get
        }
    }
}

or if you want to get the store from KafkaStreams instance

@Bean
public StreamsBuilderFactoryBeanCustomizer streamsBuilderFactoryBeanCustomizer() {
    return factoryBean -> factoryBean.setKafkaStreamsCustomizer((KafkaStreamsCustomizer) kafkaStreams -> {
        kafkaStreams.setStateListener((newState, oldState) -> {
            if (newState == KafkaStreams.State.RUNNING && oldState == KafkaStreams.State.REBALANCING) {
                //get and assign your store using kafkaStreams.store("stateStore", QueryableStoreTypes.keyValueStore());
                //and set flag that `stateStore` store of current KafkaStreams has been fully restore
            }
        });
    });
}

Read more in the docs.

Note that there should be only one instance of StreamsBuilderFactoryBeanCustomizer.

Tuyen Luong
  • 1,316
  • 8
  • 17
  • My observation is that although the state of the stream is already `RUNNING` there are still elements _flowing in_ the StateStore. – Frank Mar 12 '20 at 20:46
  • `flowing in` do you mean that stateStore is still restoring? how do you check this? – Tuyen Luong Mar 13 '20 at 00:22
  • Seem I misunderstand you , I thought you ask about restore process, will update the answer – Tuyen Luong Mar 13 '20 at 07:17
  • I added the `stateStore#approximateNumEntries` to a custom `HealthIndicator` and this number is still increasing although the overall `/health` is already `UP`. – Frank Mar 13 '20 at 08:09
  • For the provided program on initial startup there is nothing to be restored -- the application will start processing with an empty state -- for the given program, processing _mean_ writing into the store... – Matthias J. Sax Mar 13 '20 at 09:35