0

I'm a beginner in Kafka and trying to consume the latest unconsumed or un processed messages on a topic and below is the function I came up with.

It works fine but have a logical problem though, it is returning the last consumed message again and again eventhough there are no new messages in the topic. Ideally, I'm trying to get only the latest unconsumed messages, if nothing new, just return stating -- "No new messages in the topic. All the messages are already consumed." I tried various options, like setting offset to earliest and latest but nothing worked.

I'm kind of stumped and any guidance will be of great help .

In my case, the last consumed offset is 140 and I would like to process messages from 141 onwards, but, if 141 hasn't arrived, my function still returns 140's message.

I'm using confluent_kafka.

from confluent_kafka import Consumer, TopicPartition

    def get_topic_latest_offset_message(topic, broker, kafka_group="example-topic"):
        none_config = {}
        c = Consumer({"bootstrap.servers": broker, "group.id": kafka_group, "auto.offset.reset": "latest"})
        _ , high = c.get_watermark_offsets(TopicPartition(topic, 0), timeout=5)
        c.assign([TopicPartition(topic, 0, high - 1)])
        message = c.poll(timeout=5)
        c.close()
        if message is None:
            # keep calling until you get a not null message
            get_topic_latest_offset(topic, broker, kafka_group)
        elif message.error():
            raise TopicFetchError(topic)
        return json.loads(message.value().decode("utf-8"))
Rafa S
  • 45
  • 5

1 Answers1

1

The issue seems to be with the way you're setting the offset for consuming messages. Currently, you are setting the offset to high - 1, which is the last offset available in the topic. This means that if there are no new messages, you will keep getting the same message repeatedly.

To consume only the latest unconsumed messages, you need to set the offset to the next available offset after the last consumed message. In your case, you want to process messages from offset 141 onwards, so you should set the offset to high instead of high - 1.

Try the below code.

from confluent_kafka import Consumer, TopicPartition

def get_topic_latest_offset_message(topic, broker, kafka_group="example-topic"):
    c = Consumer({
        "bootstrap.servers": broker,
        "group.id": kafka_group,
        "auto.offset.reset": "latest"
    })

    _, high = c.get_watermark_offsets(TopicPartition(topic, 0), timeout=5)
    offset = high  # Set offset to the next available offset

    c.assign([TopicPartition(topic, 0, offset)])
    message = c.poll(timeout=5)
    c.close()

    if message is None:
        return "No new messages in the topic. All the messages are already consumed."
    elif message.error():
        raise TopicFetchError(topic)

    return json.loads(message.value().decode("utf-8"))

Note : If you're working with multiple partitions, you would need to modify the code accordingly to handle partition assignment and offset retrieval for each partition.

Aravind Pillai
  • 739
  • 7
  • 21
  • Thank you for the suggestions & input. However, eventhough there is a new message, lets say `141`, it is still printing out `No new messages in the topic. All the messages are already consumed.`. Introduced a sleep of 15 seconds just incase there is a delay in the arrival of messages into the topic, still didn't help though. What's off here ? – Rafa S Jul 05 '23 at 08:19