0

I have a service which reads from a Kafka topic using librdkafka. I've noticed that if the consumer shuts down for a while, some log entries build up in kafka (this is perfectly fine and expected)

What's weird, is that sometimes when I start the consumer back up and look at the pending log entries by partition, partitions assigned to the same consumer seem to be recovered at a different rate.

For example, say I have a consumer X and it claims partitions 30 through 50. When the consumer starts there are 10,000 entries pending on each.

What I see is the pending entries for 30-40 trend downward while the pending entries for 41-50 grow. When 30-40 finally hits zero (or gets close enough to zero) 41-50 starts trending downward.

Why is this happening? Is it a client feature or a server feature?

Lee Avital
  • 542
  • 6
  • 15

1 Answers1

1

The way kafka works is consumer will keep switching through the partitions to take the data, however Kafka is smart to ensure switch and handle only those many partitions what it can handle based upon the capacity of your consumer i.e had your consumer been a more powerful (server performance) it would take a little more partitions but never mind it would take the remaining partitions in second go after being done with the first ones. In summary: if you create X partitions you are expecting it to go through all one by one before re-visiting the first one, but that would eat the performance by more effort in switching. In your case, I understand that since the other partitions also have business data you don't want to delay them heavily, i suggest to reduce the number of partitions.

Mayank J
  • 71
  • 3
  • Thanks. So it it the kafka server that is deciding that the kafka consumer can only handle those partitions, or is it the consumer? Also, what if I make a change to the consuming logic so that a consumer can handle more partitions without adding more hardware? – Lee Avital Apr 08 '19 at 14:11