Setup:
- 120 python confluent-kafka consumers which are all making a subscription to the same set of topics
- 8 topics with different number of partitions: 1 topic with 84 partitions, several topics with 40-50 partitions, and the rest with 1-10 partitions. The total number of partitions is around 300.
I use pretty standard subscription code:
def __init__(self, kafka_broker_list: str, group_id: str, topics: List[str]):
from confluent_kafka import Consumer
self._consumer = Consumer({
'bootstrap.servers': kafka_broker_list,
'fetch.max.bytes': 50 * 1024 * 1024, # 50MB
'auto.offset.reset': 'earliest',
'group.id': group_id,
'enable.auto.commit': True
})
logging.info(f"Subscribing for topics: {topics}")
self._consumer.subscribe(topics, on_assign=self._on_assign, on_revoke=self._on_revoke)
The problem: Out of the 120 consumers which I start, only 84 (the same number as the number of partitions of the largest topic) get partition assignment - the others stay without any partition assignment and thus remain idle. What's worse, I usually get 5 consumers with ~ 10 assigned partitions, some with 8, a lot with 2-3-4, a also a lot of consumers with only a single partition assigned. I believe the "first" consumers to subscribe, get the most topics, until the available partitions for each topic are exhausted.
The questions:
- I read about the
partition.assignment.strategy
configuration property which is available to Java Consumers, however I couldn't find it in the Confluent Kafka Client. So is there a way to configure an assignment strategy in Confluent Kafka Python Client? - Is there a way to set a partition assignment strategy on the server, or per topic or per group ID?
- Alternately Is there are different way to distribute the load between all consumers?
Thank you for taking the time to read my question :)