What happens when a Kafka broker fails with respect to consumer group coordination?

Question

As described. Assume I have 3 brokers. When I connect as a consumer, one of the brokers becomes the group coordinator. I then kill a broker (or it dies). If I try to reconnect to a broker right away, I get coordinator unavailable error.

How does Kafka know that the broker died, and how long does it take to assign a new coordinator? And how is it configured?

This should be in the docs, but I could not find it.

Could you successfully verify the group coordinator fail recovery? I have a testing topology with 3 brokers and once the Consumer Group Coordinator crash, the topic gets correctly re balanced, but the consumer group stop receiving messages. Using version 0.11.0.1 — Yamada, Oct 18 '17 at 17:04
@Yamada yes, but it took time. It does depend at what level the client you are using operates. It is possible to rely on the coordinator, or to do it manually in the client. I have come across both, and it has led to no end of headaches. — deitch, Oct 19 '17 at 07:05

score 0 · Answer 1 · answered Aug 08 '17 at 21:43

0

I believe a zookeeper handles everything you are asking

A critical dependency of Apache Kafka is Apache Zookeeper, which is a distributed configuration and synchronization service. Zookeeper serves as the coordination interface between the Kafka brokers and consumers. The Kafka servers share information via a Zookeeper cluster. Kafka stores basic metadata in Zookeeper such as information about topics, brokers, consumer offsets (queue readers) and so on.

Since all the critical information is stored in the Zookeeper and it normally replicates this data across its ensemble, failure of Kafka broker / Zookeeper does not affect the state of the Kafka cluster. Kafka will restore the state, once the Zookeeper restarts. This gives zero downtime for Kafka. The leader election between the Kafka broker is also done by using Zookeeper in the event of leader failure.
I would suggest go through the blogs

The role of Apache ZooKeeper in Apache Kafka

answered Aug 08 '17 at 21:43

marvel308

10,288
1
21
32

@marvel208 consumer_offsets are stored at broker level in the latest version of Kafka 0.11.0.0 release. So, broker state can affect the state of entire cluster. – FindingTheOne Aug 09 '17 at 02:41
@marvel308 not really. Zookeeper is just the state store (replicated + some logic). But an actual broker acts as the coordinator. I want to know how Kafka manages recognition of loss of coordinator leader, how it assigns a new coordinator, and how to configure it. – deitch Aug 09 '17 at 04:20
Every broker has a information about the list of topics(and partitions) and their leaders which will be kept up to date by the zoo keeper whenever the new leader is elected or when the number of partition changes. – marvel308 Aug 09 '17 at 04:30
@FindingTheOne yes I'm aware that is why I asked the version number in the comments – marvel308 Aug 09 '17 at 04:30
Hi @marvel308, do you recommend to use a supervisory process to manage the zookeepers? – jumping_monkey Dec 19 '19 at 02:18

FindingTheOne · Answer 2 · 2017-08-09T15:04:28.053

0

I will recommend reading the following StackOverflow post and Kafka Confluence Wiki to understand the internals.

Coordinator Failover

Group Coordinator and Consumer Group

Kafka Client-side Assignment Proposal

There is slight variation in new version of Kafka i.e. __consumer_offsets(used to store offsets of consumers) is stored at broker instead of Zookeeper and coordinator + Zookeeper is used to maintain consumer group stated.

edited Aug 09 '17 at 15:04

answered Aug 09 '17 at 02:50

FindingTheOne

189
4
14

neither of those describes what happens when the broker that is a coordinator fails. – deitch Aug 09 '17 at 04:20
I did find something on coordinator fail-over in this [link](https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Detailed+Consumer+Coordinator+Design#KafkaDetailedConsumerCoordinatorDesign-10.OnCoordinatorFailover). – FindingTheOne Aug 09 '17 at 05:19
That is closer. I am looking for the actual mechanism and the configuration control. – deitch Aug 09 '17 at 05:54
Whenever the current coordinator's hosted server dies, other coordinator's elector will realize that through the ZK listener and will try to re-elect to be the leader, and whoever wins will trigger the coordinator startup procedure. This does suggest that it is to be done via a zookeeper no ? – marvel308 Aug 09 '17 at 07:59
@deitch. I don't see any configuration on Kafka documentation for coordinator control. The next steps could be checking the logs on a test run and checking Kafka code. – FindingTheOne Aug 09 '17 at 15:02
@marvel308 yeah, I guess that is fair. Still, it isn't _configured_ via ZK. – deitch Aug 09 '17 at 15:33
@FindingTheOne me neither. Hence I asked. – deitch Aug 09 '17 at 15:34
Kafka nodes are supposed to be individual entities, I don't think their is any other way for them to know about each other except using a zookeeper – marvel308 Aug 09 '17 at 15:54

What happens when a Kafka broker fails with respect to consumer group coordination?

2 Answers2