Why isn't my Kafka load being balanced among brokers?

Question

I have a consumer group reading from a topic with ten partitions:

[root@kafka01 kafka]# ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group ssIncomingGroup --describe

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                                        HOST            CLIENT-ID
ssIncomingGroup ssIncoming      3          13688           13987           299             ssTS@influx01 (github.com/segmentio/kafka-go)-f1c5b4c7-9cf0-4132-902a-db9d0429d520 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      7          13484           13868           384             ssTS@influx01 (github.com/segmentio/kafka-go)-f1c5b4c7-9cf0-4132-902a-db9d0429d520 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      2          13322           13698           376             ssTS@influx01 (github.com/segmentio/kafka-go)-20ee82a9-825d-4d9a-9f20-f4610c21f171 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      8          13612           13899           287             ssTS@influx01 (github.com/segmentio/kafka-go)-20ee82a9-825d-4d9a-9f20-f4610c21f171 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      1          13568           13932           364             ssTS@influx01 (github.com/segmentio/kafka-go)-df68ca85-d722-47ef-82c2-2fd60e186fac /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      6          13651           13950           299             ssTS@influx01 (github.com/segmentio/kafka-go)-df68ca85-d722-47ef-82c2-2fd60e186fac /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      0          13609           13896           287             ssTS@influx01 (github.com/segmentio/kafka-go)-10b7f10f-9535-4338-9851-f583a9a7c935 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      5          13646           13945           299             ssTS@influx01 (github.com/segmentio/kafka-go)-10b7f10f-9535-4338-9851-f583a9a7c935 /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      4          13543           13843           300             ssTS@influx01 (github.com/segmentio/kafka-go)-3c847add-172f-4007-adf2-ce486686dd7c /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)
ssIncomingGroup ssIncoming      9          13652           13951           299             ssTS@influx01 (github.com/segmentio/kafka-go)-3c847add-172f-4007-adf2-ce486686dd7c /192.168.33.10  ssTS@influx01 (github.com/segmentio/kafka-go)

I am using the Segment.io Kaka library for Go: "github.com/segmentio/kafka-go".

My Kafka writer looks like this:

kafkaWriter := kafka.NewWriter(kafka.WriterConfig{
    Async:         false,
    Brokers:       config.KafkaHosts,  // a string slice of 4 Kafka hosts
    QueueCapacity: kafkaQueueCapacity,
    Topic:         kafkaTopic,
    Balancer: &kafka.LeastBytes{},  // Same result with the default round-robin balancer
})

My Kafka reader looks like this:

    kafkaReader := kafka.NewReader(kafka.ReaderConfig{
        Brokers: config.KafkaHosts,  // same as above
        GroupID: config.KafkaGroup,
        Topic:   config.KafkaTopic,  // same as above
    })

The topic was initially created like this:

conn.CreateTopics(kafka.TopicConfig{
    NumPartitions:     config.KafkaPartitions,  // == 10
    ReplicationFactor: config.KafkaReplication,  // == 1
    Topic:             kafkaTopic,  // same as above
})

When I run my program and watch host and network load, I see that almost all load / network activity is on one of the four Kafka brokers. When I du the log directories for the Kafka hosts, that same host has much more Kafka data on the FS than the others (for example, 150M as opposed to 15M).

What I want and expect to happen is to have the load distributed among all four Kafka servers, so that one does not become a bottleneck (from CPU or network). Why isn't this happening?

Edit (adding requested command output):

[root@kafka01 kafka]# bin/kafka-topics.sh --describe --bootstrap-server localhost:9092                                                                                                                                    
Topic: ssIncoming       PartitionCount: 10      ReplicationFactor: 1    Configs: flush.ms=1000,segment.bytes=536870912,flush.messages=10000,retention.bytes=1073741824                                                    
        Topic: ssIncoming       Partition: 0    Leader: 4       Replicas: 4     Isr: 4                                                                                                                                    
        Topic: ssIncoming       Partition: 1    Leader: 2       Replicas: 2     Isr: 2                                                                                                                                    
        Topic: ssIncoming       Partition: 2    Leader: 3       Replicas: 3     Isr: 3                       
        Topic: ssIncoming       Partition: 3    Leader: 1       Replicas: 1     Isr: 1
        Topic: ssIncoming       Partition: 4    Leader: 4       Replicas: 4     Isr: 4                       
        Topic: ssIncoming       Partition: 5    Leader: 2       Replicas: 2     Isr: 2                      
        Topic: ssIncoming       Partition: 6    Leader: 3       Replicas: 3     Isr: 3                       
        Topic: ssIncoming       Partition: 7    Leader: 1       Replicas: 1     Isr: 1                       
        Topic: ssIncoming       Partition: 8    Leader: 4       Replicas: 4     Isr: 4                       
        Topic: ssIncoming       Partition: 9    Leader: 2       Replicas: 2     Isr: 2                       
Topic: __consumer_offsets       PartitionCount: 50      ReplicationFactor: 1    Configs: compression.type=producer,cleanup.policy=compact,flush.ms=1000,segment.bytes=104857600,flush.messages=10000,retention.bytes=1073$41824     
        Topic: __consumer_offsets       Partition: 0    Leader: 4       Replicas: 4     Isr: 4               
        Topic: __consumer_offsets       Partition: 1    Leader: 1       Replicas: 1     Isr: 1                                                                                                                            
        Topic: __consumer_offsets       Partition: 2    Leader: 2       Replicas: 2     Isr: 2               
        Topic: __consumer_offsets       Partition: 3    Leader: 3       Replicas: 3     Isr: 3               
        Topic: __consumer_offsets       Partition: 4    Leader: 4       Replicas: 4     Isr: 4               
        Topic: __consumer_offsets       Partition: 5    Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 6    Leader: 2       Replicas: 2     Isr: 2          
        Topic: __consumer_offsets       Partition: 7    Leader: 3       Replicas: 3     Isr: 3      
        Topic: __consumer_offsets       Partition: 8    Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 9    Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 10   Leader: 2       Replicas: 2     Isr: 2        
        Topic: __consumer_offsets       Partition: 11   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 12   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 13   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 14   Leader: 2       Replicas: 2     Isr: 2        
        Topic: __consumer_offsets       Partition: 15   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 16   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 17   Leader: 1       Replicas: 1     Isr: 1      
        Topic: __consumer_offsets       Partition: 18   Leader: 2       Replicas: 2     Isr: 2               
        Topic: __consumer_offsets       Partition: 19   Leader: 3       Replicas: 3     Isr: 3                                                                                                                            
        Topic: __consumer_offsets       Partition: 20   Leader: 4       Replicas: 4     Isr: 4                                                                                                                            
        Topic: __consumer_offsets       Partition: 21   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 22   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 23   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 24   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 25   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 26   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 27   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 28   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 29   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 30   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 31   Leader: 3       Replicas: 3     Isr: 3                                                                                                                            
        Topic: __consumer_offsets       Partition: 32   Leader: 4       Replicas: 4     Isr: 4               
        Topic: __consumer_offsets       Partition: 33   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 34   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 35   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 36   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 37   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 38   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 39   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 40   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 41   Leader: 1       Replicas: 1     Isr: 1
        Topic: __consumer_offsets       Partition: 42   Leader: 2       Replicas: 2     Isr: 2
        Topic: __consumer_offsets       Partition: 43   Leader: 3       Replicas: 3     Isr: 3
        Topic: __consumer_offsets       Partition: 44   Leader: 4       Replicas: 4     Isr: 4
        Topic: __consumer_offsets       Partition: 45   Leader: 1       Replicas: 1     Isr: 1
    Topic: __consumer_offsets       Partition: 46   Leader: 2       Replicas: 2     Isr: 2
    Topic: __consumer_offsets       Partition: 47   Leader: 3       Replicas: 3     Isr: 3
    Topic: __consumer_offsets       Partition: 48   Leader: 4       Replicas: 4     Isr: 4
    Topic: __consumer_offsets       Partition: 49   Leader: 1       Replicas: 1     Isr: 1

(Edit 2): Here are the variables I use in generating the Kafka configuration files. They're the same for each of the 4 brokers.

scala_version: 2.12
kafka_config_broker_id: 0
kafka_config_log_dirs: "/tmp/kafka_logs"
kafka_config_log_flush_interval_messages: 10000
kafka_config_log_flush_interval_ms: 1000
kafka_config_log_retention_bytes: 1073741824
kafka_config_log_retention_check_interval: 60000
kafka_config_log_retention_hours: 168
kafka_config_log_segment_bytes: 536870912
kafka_config_num_io_threads: 4
kafka_config_num_network_threads: 2
kafka_config_num_partitions: 2
kafka_config_offsets_topic_replication_factor: 1
kafka_config_receive_buffer_bytes: 1048576
kafka_config_request_max_bytes: 104857600
kafka_config_send_buffer_bytes: 1048576
kafka_config_zookeeper_connection_timeout_ms: 1000000
kafka_config_zookeeper_servers:
    - consul01
    - consul02
    - consul03
kafka_exporter_version: 1.2.0
kafka_port: 9092
kafka_version: 2.4.0

This data is used in an Ansible template. The generated kafka confs look like this:

broker.id=1
port=9092
num.network.threads=2
num.io.threads=4
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka_logs
num.partitions=2
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=536870912
log.retention.check.interval.ms=60000

# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
log.cleaner.enable=false

offsets.topic.replication.factor=1

zookeeper.connect=consul01:2181,consul02:2181,consul03:2181
zookeeper.connection.timeout.ms=1000000

delete.topic.enable=true

Note that this is for development and these are being respun frequently (several times per day). The issue persists after each respin.

Can you share output of this command: bin/kafka-topics.sh --describe --bootstrap-server localhost : 9092 --topic topicName — H.Ç.T, Feb 15 '20 at 20:10
Are the keys being generate unique and are you using the default partitioner as the producers partitioner.class? — wgroleau, Feb 15 '20 at 20:27
@wgroleau - (If I understand the question correctly): I am not using keys, just values. I am just keeping simple data without concern for order. — Ken - Enough about Monica, Feb 15 '20 at 22:13
@cricket_007 I don't care about reliability (extra copies / replicas), I just care about distributing the data over several partitions so that the reader can read them from various hosts. I only increased partitions from 4 to 10 during troubleshooting. — Ken - Enough about Monica, Feb 16 '20 at 02:02
@wgroleau ? The Segment.io golang lib allows the user to specify the balancer in the writer, I've tried round robin etc. — Ken - Enough about Monica, Feb 16 '20 at 03:33
In the latest version of Kafka, consumers can read from the followers of leader partitions — OneCricketeer, Feb 16 '20 at 06:54

H.Ç.T · Answer 1 · 2020-02-16T06:54:34.557

It seems load is balanced very well now:

Partition leaders are distributed between brokers in the best balanced way possible
- Broker 1 is the leader of partitions 3,7
- Broker 2 is the leader of partitions 1,5,9
- Broker 3 is the leader of partitions 2,6
- Broker 4 is the leader of partitions 0,4,8
Partitions are also assigned to consumers evenly (2 partitions per consumer)
Number of offsets are nearly the same in the partitions (so it seems you are producing messages to partitions evenly)

When I du the log directories for the Kafka hosts, that same host has much more Kafka data on the FS than the others (for example, 150M as opposed to 15M).

Log offsets in partitions are almost the same. But of course, broker 2 and 4 must have much more data because they deal with one more partition as you see. Also network traffic must be much more because they deal with 3 partitions. (poll requests from consumers and also send requests of producer)

But still 10 times much more data in one broker is not sensible. IMHO at some point one or more broker were unhealthy (cannot send heartbeat to Zookeeper or down) and Controller assigned partitions to healthy broker(s) and for a while some broker(s) were taking care of much more partitions. (btw auto.leader.rebalance.enable must be true for this scneario)

Note: I assume that your broker configs (especially the configs about log.retention has important role for data stored in brokers) and system resources of the brokers are the same. If they are not, you should specify it.

By the way, if you are not happy with the current assignment of partitions to brokers. You can manually change it by using kafka-reassign-partitions.sh tool. You just need to create a json file which specifies replicas of the partitions.

For example:

{"version":1,
  "partitions":[
     {"topic":"ssIncoming","partition":0,"replicas":[1]},
     {"topic":"ssIncoming","partition":1,"replicas":[1]},
     {"topic":"ssIncoming","partition":2,"replicas":[1]},
     {"topic":"ssIncoming","partition":3,"replicas":[2]},
     {"topic":"ssIncoming","partition":4,"replicas":[2]},
     {"topic":"ssIncoming","partition":5,"replicas":[3]},
     {"topic":"ssIncoming","partition":6,"replicas":[3]},
     {"topic":"ssIncoming","partition":7,"replicas":[3]},
     {"topic":"ssIncoming","partition":8,"replicas":[4]},
     {"topic":"ssIncoming","partition":9,"replicas":[4]}
]}

Then you just need to run this command:

./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file change-replicas.json --execute

This helps me understand the output of the commands better, I appreciate that. I've added my Kafka configuration variables to the bottom of the question. Does that give any more clues? What you say makes sense but what I see does not (MB/s level throughput on one broker, kB/s magnitude throughput on the others). — Ken - Enough about Monica, Feb 16 '20 at 17:23

Why isn't my Kafka load being balanced among brokers?

1 Answers1