We are facing an issue with kafka state store which has a standby replica.
We have two topologies(streams). One without any internal state store and one with an internal state store.
We have problem with the topology which has the state store. So here I am giving info about that topology only:
The active and standby tasks of that stream are running in two seperate kubernetes pods in two nodes.
The active tasks consumes some kafka messages , processes them and write them in the state store. The pod that holds the active task is killed. The stand by task is assigned to the changelog topics. But the consumer is seeking to EARLIEST offset. Since this changelog topic has thousands of messages it takes very long time complete.
I was expecting that the latest offset from the previous active task should be available to the standby (now active) task so that it would not need to pump up all the events.
Please correct me as I ran out of options and dont know where to check furthur.
I am using kafka-streams version 2.5.0
Configuration:
These are the configured values for the topology:
Properties properties = new Properties();
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "myService");
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
properties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, ONE_THOUSAND);
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 10 * 1024 * 1024L);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once");
properties.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
properties.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);
properties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6000);
properties.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, THREE);
Topology:
topology.addSource(AGENT_ADMIN_SOURCE, Serdes.String().deserializer(), serde.deserializer(), inboundAdminTopic)
.addSource(MS_SOURCE_MATCH, Serdes.String().deserializer(), kafkaAvroDeserializer(), inboundMatchTopic)
.addProcessor(MsProcessor.PROCESSOR_NAME,
() -> new MsProcessor(msActionManager(), stateStoreDAO()),
AGENT_ADMIN_SOURCE, MS_SOURCE_MATCH)
.addStateStore(agentAdminStateStore(), MsProcessor.PROCESSOR_NAME) //agent-state-store
.addStateStore(resourceMatchingStateStore(), MsProcessor.PROCESSOR_NAME) //service-matches-state-store
.addSink(...)
.addSink(...);
//when the service installed this pod gets the stand by task
{"@timestamp":"2020-08-31T10:50:17.253+00:00","@version":"1","message":"stream-thread [myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1] partition assignment took 10 ms.\n\tcurrently assigned active tasks: []\n\tcurrently assigned standby tasks: [0_0]\n\trevoked active tasks: []\n\trevoked standby tasks: []\n","logger_name":"org.apache.kafka.streams.processor.internals.StreamThread","thread_name":"myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1","level":"INFO","level_value":20000}
The other pod is killed. The stand by task is expected to read from the latest offset in the change log topic. But it goes to earliest.
{"@timestamp":"2020-08-31T10:50:17.302+00:00","@version":"1","message":"[Consumer clientId=myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): myService-agent-state-store-changelog-0,myService-service-matches-state-store-changelog-0 ", "logger_name":"org.apache.kafka.clients.consumer.KafkaConsumer","thread_name":"myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1","level":"INFO","level_value":20000}
{"@timestamp":"2020-08-31T10:50:17.306+00:00","@version":"1","message":"[Consumer clientId=myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition myService-agent-state-store-changelog-0","logger_name":"org.apache.kafka.clients.consumer.internals.SubscriptionState","thread_name":"myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1","level":"INFO","level_value":20000}
{"@timestamp":"2020-08-31T10:50:17.306+00:00","@version":"1","message":"[Consumer clientId=myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1-restore-consumer, groupId=null] Seeking to EARLIEST offset of partition myService-service-matches-state-store-changelog-0","logger_name":"org.apache.kafka.clients.consumer.internals.SubscriptionState","thread_name":"myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1","level":"INFO","level_value":20000}
Could you please tell why does this seek to earliest instead of the latest known offset.
Also why I am getting hundreds of these messages
{"@timestamp":"2020-08-31T11:50:39.849+00:00","@version":"1","message":"[Consumer clientId=myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1-restore-consumer, groupId=null] Subscribed to partition(s): myService-agent-state-store-changelog-0, myService-service-matches-state-store-changelog-0 ", "logger_name":"org.apache.kafka.clients.consumer.KafkaConsumer","thread_name":"myService-418e5529-d591-46b3-b1f4-1da74e3926f4-StreamThread-1","level":"INFO","level_value":20000}