2
StreamsBuilder builder = new StreamsBuilder();

    Map<String, ?> serdeConfig = Collections.singletonMap(SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);

    Serde keySerde= getSerde(keyClass);
    keySerde.configure(serdeConfig,true);

    Serde valueSerde = getSerde(valueClass);
    valueSerde.configure(serdeConfig,false);

    StoreBuilder<KeyValueStore<K,V>> store =
        Stores.keyValueStoreBuilder(
            Stores.persistentKeyValueStore("mystore"),
            keySerde,valueSerde).withCachingEnabled();

    builder.addGlobalStore(store,"mytopic", Consumed.with(keySerde,valueSerde),this::processMessage);

    streams=new KafkaStreams(builder.build(),properties);

    registerShutdownHook();

    streams.start();

    readOnlyKeyValueStore = waitUntilStoreIsQueryable("mystore", QueryableStoreTypes.<Object, V>keyValueStore(), streams);


private <T> T waitUntilStoreIsQueryable(final String storeName,
      final QueryableStoreType<T> queryableStoreType,
      final KafkaStreams streams) {

    // 25 seconds
    long timeout=250;

    while (timeout>0) {
      try {
        timeout--;
        return streams.store(storeName, queryableStoreType);
      } catch (InvalidStateStoreException ignored) {
        // store not yet ready for querying
        try {
          Thread.sleep(100);
        } catch (InterruptedException e) {
          logger.error(e);
        }
      }
    }
    throw new StreamsException("ReadOnlyKeyValueStore is not queryable within 25 seconds");
  }

The error is as follows:

19:42:35.049 [my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d-GlobalStreamThread] ERROR org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer - global-stream-thread [my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d-GlobalStreamThread] Updating global state failed. You can restart KafkaStreams to recover from this error.
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {my_component-0=6}
    at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:990) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:491) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1269) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1200) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1176) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:239) [kafka-streams-2.3.0.jar:?]
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:290) [kafka-streams-2.3.0.jar:?]
19:42:35.169 [my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d-GlobalStreamThread] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d] Global thread has died. The instance will be in error state and should be closed.
19:42:35.169 [my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d-GlobalStreamThread] ERROR org.apache.zookeeper.server.NIOServerCnxnFactory - Thread Thread[my_component.app-91fa5d9f-aba8-4419-a063-93635903ff5d-GlobalStreamThread,5,main] died
org.apache.kafka.streams.errors.StreamsException: Updating global state failed. You can restart KafkaStreams to recover from this error.
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:250) ~[kafka-streams-2.3.0.jar:?]
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:290) ~[kafka-streams-2.3.0.jar:?]
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {my_component-0=6}
    at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:990) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:491) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1269) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1200) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1176) ~[kafka-clients-2.2.1.jar:?]
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread$StateConsumer.pollAndUpdate(GlobalStreamThread.java:239) ~[kafka-streams-2.3.0.jar:?]
    ... 1 more

org.apache.kafka.streams.errors.InvalidStateStoreException: State store is not available anymore and may have been migrated to another instance; please re-discover its location from the state metadata.

    at org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:60)

I see two different exceptions.

  1. InvalidStateStoreException - store is not open

  2. InvalidStateStoreException - Store is not available any more and might have migrated to another instance

I have only one instance of the stream application running on Windows with an application id.

From the above core, I am waiting until the store is queryable, but still I get store not open and store may not be available.

What are the possible reasons for the exception (and its solution)?

First of all, is the above code write-up correct?

JavaTechnical
  • 8,846
  • 8
  • 61
  • 97
  • What is the output of `bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list --topic my_component --time -2` ? – Giorgos Myrianthous Jul 25 '19 at 14:55
  • @GiorgosMyrianthous 0:0, 1:0, 2:0 – JavaTechnical Jul 26 '19 at 04:18
  • Stores might be closed at any point -- you need to catch exceptions like this, and rediscover the store. – Matthias J. Sax Jul 26 '19 at 16:51
  • @MatthiasJ.Sax Do you mean to say that we should have UncaughtExceptionHandler call this.. But *how to rediscover the store?* – JavaTechnical Jul 27 '19 at 05:04
  • `UncaughtExceptionHandler` does not help. By "rediscover" I mean you need to get a new store handle via `KafkaStreams#store()` and the provided `KafkaStreams#metadata()` as the store may have been move to another instance. – Matthias J. Sax Jul 29 '19 at 01:00
  • @MatthiasJ.Sax I tried calling `KafkaStreams.store()` again when InvalidStateStoreException came on `readOnlyKeyValueStore.get()`, but then, I got streams is not running and that it's state is ERROR. i.e. to say, the same error I pasted above. – JavaTechnical Jul 30 '19 at 10:43
  • 1
    If KafkaStreams is in error state, you need to `close()` the client and create a new instance to restart it, as indicated by the error message: `You can restart KafkaStreams to recover from this error.` – Matthias J. Sax Jul 30 '19 at 18:41
  • @MatthiasJ.Sax But, to build the object we require the topology and properties, it would better if KafkaStreams comes up with streams.getToplogy() and getProperties() methods. Otherwise, I would have to keep the track of the same. – JavaTechnical Jul 31 '19 at 02:47
  • @MatthiasJ.Sax Also, is cleanUp() required (since it is offsetsoutofrangeexception?) – JavaTechnical Jul 31 '19 at 02:49
  • I hear you. Unfortunately, this is the way KafkaStreams works atm -- feel free to open a Jira for it -- I agree that there is room for improvements to record from an ERROR state. -- Calling `cleanUp()` is not required and not recommended for this case -- you should only call `cleanUp()` if you want to wipe out local state (for example, if you want to reset your application to reprocess data). – Matthias J. Sax Aug 01 '19 at 01:54
  • 1
    @MatthiasJ.Sax I have posted my answer for that. Can you check? – JavaTechnical Aug 02 '19 at 05:29
  • @MatthiasJ.Sax Does this OffsetOutOfRangeException come even when the retention period is over (i.e., if the topic is cleaned up or compacted), and my store still contains the offset (some previously existing value). How does streams application know that topic retention has deleted the values that it caches? – JavaTechnical Aug 09 '19 at 06:18
  • Yes, if for example topic retention ticks in, and a consumer tries to read from an expired offset, you would get `OffsetOutOfRangeException`. Kafka Streams may only detect this case if it tries to read the data from the brokers. – Matthias J. Sax Aug 09 '19 at 14:21
  • @MatthiasJ.Sax So for example if the streams app goes down with offset 3 and the topic is cleaned up due to its retention period, then the streams application will check for offset 3 in the topic and it does not exist, so the streams app will exit with OffsetOutOfRangeException, but this should not be the ideal scenario. I feel, that it must re-build its store again! – JavaTechnical Aug 12 '19 at 06:29
  • Moreover, you said it may detect the case only if it tries to read data from the brokers, this may happen at any time and it can read after the retention period is over, so this exception is always bound to come for topics when the retention period is expired. – JavaTechnical Aug 12 '19 at 06:31
  • For regular tasks and during the bootstrapping of global state stores, Kafka Streams will catch the exception and does the cleanup automatically (since 1.1.0: https://issues.apache.org/jira/browse/KAFKA-6121 -- maybe you are using an older version?) – Matthias J. Sax Aug 12 '19 at 15:39
  • However, the stack trace if the question indicates, that the error occurs during regular processing/updating the global store (-> `pollAndUpdate()`). For this case, it's treated as fatal error, because it indicates that the application did lag for larger than retention time (even if the application was online). – Matthias J. Sax Aug 12 '19 at 15:41

1 Answers1

3

OffsetOutOfRangeException means that the offsets that are stored in the state in the .checkpoint file are out of range with those offsets of the topic in the Kafka cluster.

This happens when the topic is cleared and or re-created. It may not contain those many messages as that of the given offsets in the checkpoint.

I have found that, resetting the .checkpoint file will help. The .checkpoint file will be something like this.

0
1
my_component 0  6
my_component 1  0

Here, 0 is partition and 6 is offset. Similarly, 1 is partition and 0 is offset.

The description my_component-0-6 in the exception means that 6th offset of 0th partition of my_component topic is out of range.

Since, the topic is re-created, the 6th offset does not exist. So change 6 to 0.


It is important to note that, while unit testing Kafka, you must clean up the state directory once the test is complete, because your embedded Kafka cluster and its topics does not exist after the test is completed and therefore it does not make sense to retain the offsets in your state store (since they will become stale).

So, ensure that your state directory (typically, /tmp/kafka-streams or in Windows C:\tmp\kafka-streams) is cleaned up after the test.

Also, resetting the checkpoint file is only a workaround, and is not an ideal solution in production.


In production, if the state store is in-compatible with that of its corresponding topic (that is offsets are out of range), then it means that there is some corruption, possible some one might have deleted and re-created the topic.

In such a situation, I think, clean up might be the only possible solution. Because, your state store contains stale information which is therefore no longer valid (so far as new topic is concerned).

JavaTechnical
  • 8,846
  • 8
  • 61
  • 97