I am trying to read a kafka topic from the earliest offset and then tombstone certain records through a python script. Since the messages are huge in number (Million +), I want to leverage multiprocessing to make the script faster while consuming the messages. Here's a snippet from the script:
from kafka import KafkaConsumer
def cleanup_kafka_topic(self, env):
# Declarations
consumer = KafkaConsumer(<topic_name>, group_id=<some_group>),
bootstrap_servers=[<kafka_host:kafka_port>],
auto_offset_reset='earliest', enable_auto_commit=True)
# Clean-up logic
for msg in consumer:
# Do something with the msg
I am using Kafka-python.