- spring-cloud-stream-binder-kafka 3.0.9-RELEASE
- spring-boot 2.2.13.RELEASE
Hi, we have a project using Spring Cloud Stream with kafka and we are having a problem in reconnecting the consumers when the broker nodes are down for a period of time.
The problem is that the consumer is not able to reconnect and acquire the partitions because it is trying to check the offset position of a partition that is no longer assigned to it? how can this happen?
The logs are shown below:
2021-06-09T09:39:25.358Z [mecstkac-45-6gvd4] [WARN] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] [Consumer clientId=clientid-0, groupId=groupid-v1] Connection to node 2147483644 (hostnode/10.71.34.4:9092) could not be established. Broker may not be available.
2021-06-09T09:42:30.217Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] [Consumer clientId=clientid-0, groupId=groupid-v1] User provided listener org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer$ListenerConsumerRebalanceListener failed on invocation of onPartitionsAssigned for partitions [topicName1-1]org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition topicName1-1 could be determined
2021-06-09T09:42:30.217Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] Error while processing: nullorg.apache.kafka.common.KafkaException: User rebalance callback throws an error\\n at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:403)\\nCaused by: org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition topicName1-1 could be determined"
2021-06-09T09:43:03.924Z [mecstkac-45-6gvd4] [ERROR] [KafkaConsumerDestination{consumerDestinationName='topicName1', partitions=0, dlqName='null'}.container-0-C-1] [messageKey=] Error while processing: nulljava.lang.IllegalStateException: You can only check the position for partitions assigned to this consumer.\\n at org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1717)
Is it possible that the kafka binder stores information about the previously assigned partition and tries to connect to it even though a rebalance has already been performed and it is now assigned to another consumer?
NOTE: The configuration of the consumer Assignor is the default (RangeAssignor).