2

I'm an intern, trying to come up with a script that runs as cron job(hourly), collecting messages from a Kafka topic that arrived between two time intervals. Example, process messages that arrived between 09.00 AM - 10.00 AM, 10.00 AM - 11.00 AM and so on. Should I be using the offsets to tackle this or write the end time to a file and read that during the script ?

I'm following this documentation: https://docs.confluent.io/kafka-clients/python/current/overview.html#id1 but not sure, how to get two different offsets, thereby, consuming only those messages between the last read offset and the current/new offset.

Tried this, but it gives only the latest message but not all the messages arrived between the last read offset and the new offset.

`running = True

def basic_consume_loop(consumer, topics):
    try:
        consumer.subscribe(topics)

        while running:
            msg = consumer.poll(timeout=1.0)
            if msg is None: continue

            if msg.error():
                if msg.error().code() == KafkaError._PARTITION_EOF:
                    # End of partition event
                    sys.stderr.write('%% %s [%d] reached end at offset %d\n' %
                                     (msg.topic(), msg.partition(), msg.offset()))
                elif msg.error():
                    raise KafkaException(msg.error())
            else:
                msg_process(msg)
    finally:
        # Close down consumer to commit final offsets.
        consumer.close()`
Rafa S
  • 45
  • 5

0 Answers0