I have encountered a peculiar problem working with Kafka consumers. When I have a topic with a number of partitions, and a consumer group, the consumption eventually becomes unbalanced if consumer number is less than partition number. For example, if I have 8 partitions and 4 consumers, I see something like this:
Client Partition Lag
C1 P0 1000000
C1 P1 1000000
C2 P2 0
C2 P3 0
C3 P4 1000000
C3 P5 1000000
C4 P6 0
C4 P7 0
So some clients have zero lag and are doing nothing, and some have large lag and are working hard but are left behind. Note that I could of course have 8 clients, but given the workload I don't need 8 clients, I need only four, it's just that Kafka allocates partitions in a way that in fact only two of the four can work. I could also allocate partitions manually but that would complicate the application logic a lot, I'm quite happy with using Kafka consumer group capabilities, except for this one annoying balance problem.
So, I wonder if there are any means to automatically adjust for this - i.e. somehow reassign the clients in a way that would distribute the work equally. I know there was a proposal for something like that but it seems like nothing is happening there. So I wonder if there's any way to do it automatically within existing means. I am using kafka-python
driver now, so ideally the solution would be implementable in Python, without requiring to move all the system to Java.