when initiating an aiokafka consumer with ConsumerRebalanceListener
, we have a processing process for the batch of messages(getmany()) received from kafka. We add this processing process to on_partitions_revoked
as well to make sure these processing will be finished when rebalance occurs.
When rebalance occurs and meanwhile the processing process already occurs the on_partitions_revoked
will call the processing process again and could process messages twice, to avoid this we add a lock at beginning of this process.
I'm actually not quite sure if we really need the lock in this scenario, so would appreciate if folks here can advise on this. While we use aiokafka for this case I guess this could be a general question for Kafka.
class KafkaWrapper(ConsumerRebalanceListener):
def __init__(
self,
consumer_bootstrap_servers: List[str],
consumer_topic: str,
consumer_group_id: str,
):
self.records_lock = asyncio.Lock()
self.kafka_consumer = AIOKafkaConsumer(
bootstrap_servers=consumer_bootstrap_servers,
group_id=consumer_group_id,
)
self.kafka_consumer.subscribe(topics=[consumer_topic], listener=self)
async def process_records(self):
async with self.records_lock: # Is this lock really required??
# processing message
async def on_partitions_revoked(self, _revoked):
await self.process_records()
async def on_partitions_assigned(self, _assigned):
pass
async def _watch_kafka(self):
await self.kafka_consumer.start()
while not self.should_stop.is_set():
messages = await self.kafka_consumer.getmany()
if len(local_records_by_topic_partition) > 0:
async with self.records_lock:
self.messages = messages
await self.process_records()