1

I've configured a cluster of Kafka brokers and a cluster of Zk instances using kafka_2.11-1.1.0 distribution archive.

For Kafka brokers I've configured config/server.properties

broker.id=1,2,3
zookeeper.connect=box1:2181,box2:2181,box3:2181

For Zk instances I've configured config/zookeeper.properties:

server.1=box1:2888:3888
server.2=box3:2888:3888
server.3=box3:2888:3888

I've created a basic producer and a basic consumer and I don't know why I am able to write messages / read messages even if I shut down all the Zookeeper instances and have all the Kafka brokers up and running. Even booting up new consumers, producers works without any issue.

I thought having a quorum of Zk instances is a vital point for a Kafka cluster.

For both consumer and producer, I've used following configuration:

bootrapServers=box1:9092,box2:9092,box3:9092

Thanks

Cristi
  • 180
  • 15

1 Answers1

3

I thought having a quorum of Zk instances is a vital point for a Kafka cluster.

Zookeeper quorum is vital for managing partition lists, leaders, etc. In general, ZK is necessary for management that is done by the cluster coordinator in the cluster.

Basically, right now (with ZK down), you cannot modify topics (as the partition metadata is stored in ZK), start up / shut down brokers (as they use ZK for discovery) and other similar operations.

Even booting up new consumers, producers works without any issue.

Producer/consumer operations reach out to brokers only. The broker instance can still append to the log, and can still communicate with other brokers to have replication. So it is possible to send a message, get it received by broker and saved to disk, with other brokers replicating (as they are continuously sending fetch requests to the leader (and they know who this partition's leader is because they saved that data when ZK was still running)).

Adam Kotwasinski
  • 4,377
  • 3
  • 17
  • 40
  • Thanks for your input. Is it fair to say that at this point, if any Kafka broker goes down both producers and consumers will stop working, since a rebalance will not be possible without Zk to keep the metadata? (partition to consumer map)? – Cristi Sep 25 '18 at 13:53
  • I think that's implementation dependent (Kafka docs don't describe degenerate scenarios like this), but I wouldn't be surprised - most probably you'd see some kind of `TimeoutException` about fetching metadata or so. – Adam Kotwasinski Sep 25 '18 at 13:56
  • It is quite clear for me why administration stuff and producing messages behavior is correct while Zookeeper instances are all down. For consumer part it is still not really clear what happens behind scenes. I suppose the re balance is done by the Kafka broker itself (Zookeeper just triggers it at some point based on some conditions - like Kafka broker down), but all the information (like consumer group to partitions map) are kept by Kafka broker itself. I've did some tests and creating new consumer with same/different consumer group id behaved normally, even if Zoo Keeper was done. – Cristi Sep 26 '18 at 11:01
  • If the Controller and lead replica goes down with Zookeeper down, then you'd start seeing much more failure conditions – OneCricketeer Sep 26 '18 at 13:44