Kafka topic partitions with leader -1

Question

I noticed that few of my kafka topics are behaving in a manner i cannot explain clearly.

For eg:

./kafka-topics.sh --describe --zookeeper ${ip}:2181 --topic test

Topic:test  PartitionCount:3    ReplicationFactor:1 Configs:retention.ms=1209600000
    Topic: test Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: test Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: test Partition: 2    Leader: 3   Replicas: 3 Isr: 3

I am particularly concerned about Partition: 1 which shows Leader '-1'.

I also notice that roughly 1/3 of the messages produced to this topic fail due to a 'Timeout'. This I believe is a consequence of one partition not having a leader.

I was wondering if anyone has insights into why this issue occurs and how to recover from this in a Production scenario without losing data?

EDIT: I am using the librdkafka based python producer; and the error message I see is Message failed delivery: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}

Is your second broker up and running? It would be useful if you could also post the error message for the timeout. — Giorgos Myrianthous, Aug 20 '18 at 12:00
I edited the original post with the error message I receive. I have 3 node clustered broker, and it indeed looks like one of them is down. However, producing/consuming to/from other topics (which do not have Leader: -1) seems to works as expected with failover. — irrelevantUser, Aug 20 '18 at 12:22
"in a production scenario", you would have a replication factor of at least 3, and min in sync replicas setting in the producer of 2 — OneCricketeer, Aug 20 '18 at 13:06

score 3 · Accepted Answer · answered Aug 20 '18 at 12:38

Most probably your second kafka broker is down. In order to check active Kafka brokers you need to run

./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"

And the output should be similar to the one below:

Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /brokers/ids
[0, 1, 2]
[zk: localhost:2181(CONNECTED) 1]

If the second broker is not listed in the active brokers then you need to figure out why is not up and running (logs should tell you if something went wrong). I would also suggest to increase the replication-factor since you have a multi-broker configuration.

score 1 · Answer 2 · answered Aug 20 '18 at 12:16

1

This often indicates that the broker leading that partition is offline. I would check the offline partitions metric to confirm this, but also check whether broker 2 is currently functional.

answered Aug 20 '18 at 12:16

Simon Clark

624
3
11

score 0 · Answer 3 · answered Aug 29 '23 at 01:37

0

In the latest version of kafka instead of -1, None is being used. To see offline partitions run the below command

./kafka-topics.sh --zookeeper <zookeeper_ip>:2181 --describe | grep "Leader: None"

answered Aug 29 '23 at 01:37

best wishes

5,789
1
34
59

Kafka topic partitions with leader -1

3 Answers3