I'm an intern, trying to come up with a script that runs as cron job(hourly), collecting messages from a Kafka topic that arrived between two time intervals. Example, process messages that arrived between 09.00 AM - 10.00 AM, 10.00 AM - 11.00 AM and so on. Should I be using the offsets to tackle this or write the end time to a file and read that during the script ?
I'm following this documentation: https://docs.confluent.io/kafka-clients/python/current/overview.html#id1 but not sure, how to get two different offsets, thereby, consuming only those messages between the last read offset and the current/new offset.
Tried this, but it gives only the latest message but not all the messages arrived between the last read offset and the new offset.
`running = True
def basic_consume_loop(consumer, topics):
try:
consumer.subscribe(topics)
while running:
msg = consumer.poll(timeout=1.0)
if msg is None: continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
# End of partition event
sys.stderr.write('%% %s [%d] reached end at offset %d\n' %
(msg.topic(), msg.partition(), msg.offset()))
elif msg.error():
raise KafkaException(msg.error())
else:
msg_process(msg)
finally:
# Close down consumer to commit final offsets.
consumer.close()`