I am seeing org.apache.kafka.common.errors.NotLeaderForPartitionException on my producer which I understand happens when producer tries to produce messages to a broker which is not a leader for the partition. Does that mean each time a leader fulfills a write request it first checks if its the leader or not? If yes does that translates to a zookeeper request for every write request to know if the node is the leader?
Asked
Active
Viewed 906 times
1 Answers
0
How Producer Get MetaData About Brokers
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.
When Producer Will Refresh MetaData
I think this depends what kafka client you used.There are some small differents between ruby, java or other kafka client.for example, in java:
- producer will fetch metadata when client initialize,then period update it depends on expiration time.
- producer also will force update metadata when request error occured,such as
InvalidMetadataException
.
But in ruby-kafka client, it usually refresh metadata when error occured or initialize.

spike 王建
- 1,556
- 5
- 14
-
This I understand that producer always tries to connect to the leader to the best of its knowledge. But the leader could have changed by the time producer fetched who the leader is and when it produced the message hence the NotLeaderForPartitionException. But where is this exception coming from? Is the broker always checking if it is the leader before replying? – Avikant Gupta Aug 17 '20 at 14:11
-
@AvikantGupta Kafka Producer retrieves and caches topic/partition metadata before first send. It then periodically tries to refresh this metadata, every metadata.max.age.ms (default=5minutes) for "good" and every retry.backoff.ms for "invalid" topics. These metadata refresh attempts is what you're observing in the log. – spike 王建 Aug 17 '20 at 14:20
-
thank you for your response and linking the source! So if metadeta which producer fetches every 5 mins is the only validation in kafka for which broker is the leader. Then kafka producer should be able to produce the message at a non-leader broker, if a leader change happens in those 5 mins. Is this understanding correct? – Avikant Gupta Aug 17 '20 at 15:19
-
@AvikantGupta Notably, when I double checked souce code and documents,I found java kafka client also will force update metadata when request error besides period update. – spike 王建 Aug 17 '20 at 15:54
-
So because of this edge cause the highest consistency settings that we can configure in kafka, as shown here : https://stackoverflow.com/a/54153523/12052871 are not even sufficient to use kafka as a consistent system Being such a big fan of kafka this is a little disheartening – Avikant Gupta Aug 18 '20 at 08:43
-
I happen to stumble upon the KIP-500: the almighty jira to remove zookeeper dependency. There it says "In the current world, a broker which can contact ZooKeeper but which is partitioned from the controller will continue serving user requests, but will not receive any metadata updates. This can lead to some confusing and difficult situations. For example, a producer using acks=1 might continue to produce to a leader that actually was not the leader any more, but which failed to receive the controller's LeaderAndIsrRequest moving the leadership." – Avikant Gupta Aug 25 '20 at 06:03
-
@AvikantGupta I think it is sufficient to choose kafka as a consistent message queue system.Kafka broker will ack producer's messages when their replica synced(you can config this by ```min.insync.replicas```).And if the broker no ack with the message, the producer will retry.And I do not think KIP-500 is a bad news, using ZooKeeper to keep metadata will lead to an unnecessarily steep learning curve and increases the risk of some misconfiguration causing a security breach.Kafka's partition is sufficient to store metadata by itself. – spike 王建 Aug 27 '20 at 02:37