I have a Apache Kafka cluster with 3 brokers and I would like to detect when the cluster is no longer available in order to switch the client connection to a second replicated cluster (as described here: How to consume from two different clusters in Kafka?).
All the topics on the cluster have a replication factor of 3, so all data shall be available within the cluster if a single nodes fails.
In this case, the cluster can be considered unusable if 2 brokers are offline. I am using Confluent.Kafka nuget package (https://www.nuget.org/packages/Confluent.Kafka/) to create a .NET client. However, both using the Producer and Consumer client functionalities, it is only possible to detect when all the brokers are down (by checking the Local_AllBrokersDown error code).
One solution would be to have a producer that continuously produces messages in a topic in order to 'heardbeat' the cluster. With the replication factor of 3, I set the min.insync.replicas for the specific topic to 2. According to the specification, if the producer uses Ack=All, I should receive a NotEnoughReplicas error code when trying to publish a message.
In practice, when 2 brokers go offline, my client application is connected to only the one broker left which cannot create a cluster by itself. If I use the KafkaManager on this remaining broker, it stil states that it is connected to another broker and the topic has 2 in-sync-replicas. The .NET client does therefore not receive NotEnoughReplicas error code (only Local_TimedOut error code from the remaining online broker). This might be designed in this way in order to avoid the split-brain...
Anybody has an idea on how I could monitor the availability of such a cluster - in this specific case, when 2 brokers are down?
Thank you!