confluent_kafka: how to reliably seek before reading data (avoiding Erroneous state)

Question

I'm trying to switch Python code from aiokafka to confluent_kafka and having problems with reading historical data.

The system has only one producer for a given topic, and several independent consumers (each with a separate group ID). When each consumer starts it wants to read the most recent historical message for a subset of topics (call them historical topics), then read all new messages. The exact starting point of the historical data doesn't matter, as the main point is to get information for topics that are rarely written. The topics for which historical data are wanted will only ever have one partition.

It's getting the historical data that is giving me fits.

I would prefer not to have to read any messages before seeking, since the message is likely to be newer than I want to start with. But it appears one has to at least call Consumer.poll before Kafka assigns topic partitions.

What is the recommended sequence?

I have tried two basic approaches:

Use automatic topic partition assignment and the on_assign callback argument to Consumer.subscribe to read the current offset and call seek.
Manually assign partitions and use those partitions to read the current offset and call seek.

In both cases:

Consumer.seek usually or always fails with ""Local: Erroneous state".
Consumer.positions always returns -1001, which might be a clue. To get around that I call Consumer.get_watermark_offsets.

Here is a simple example using on_assign:

from confluent_kafka import Consumer
from confluent_kafka.admin import AdminClient, NewTopic
from confluent_kafka.error import KafkaError
import base64
import os

max_history = 3
broker_addr = "broker:29092"
topic_names = ["test.message"]


def seek_back(
    consumer,
    partitions,
):
    print(f"seek_back({partitions})")

    # Show that consumer.position returns nothing useful
    position_partitions = consumer.position(partitions)
    print(f"{position_partitions=}")

    for partition in partitions:
        _, offset = consumer.get_watermark_offsets(partition)
        print(f"{partition.topic} has offset {offset}")
        if offset <= 0:
            continue

        partition.offset = max(0, offset - max_history)
        try:
            consumer.seek(partition)
        except Exception as e:
            print(f"{partition.topic} seek to {partition.offset} failed: {e!r}")
        else:
            print(f"{partition.topic} seek to {partition.offset} succeeded")


def run(topic_names):
    random_str = base64.urlsafe_b64encode(os.urandom(12)).decode().replace("=", "_")
    consumer = Consumer(
        {
            "group.id": random_str,
            "bootstrap.servers": broker_addr,
            "allow.auto.create.topics": False,
        }
    )
    new_topic_list = [
        NewTopic(topic_name, num_partitions=1, replication_factor=1)
        for topic_name in topic_names
    ]
    broker_client = AdminClient({"bootstrap.servers": broker_addr})
    create_result = broker_client.create_topics(new_topic_list)
    for topic_name, future in create_result.items():
        exception = future.exception()
        if exception is None:
            continue
        elif (
            isinstance(exception.args[0], KafkaError)
            and exception.args[0].code() == KafkaError.TOPIC_ALREADY_EXISTS
        ):
            pass
        else:
            print(f"Failed to create topic {topic_name}: {exception!r}")
            raise exception

    consumer.subscribe(topic_names, on_assign=seek_back)
    while True:
        message = consumer.poll(timeout=0.1)
        if message is not None:
            error = message.error()
            if error is not None:
                raise error
            print(f"read {message=}")
            return


run(topic_names)

Running this after writing some messages for that topic (using other code) gives me:

seek_back([TopicPartition{topic=test.topic,partition=0,offset=-1001,error=None}])
position_partitions=[TopicPartition{topic=test.topic,partition=0,offset=-1001,error=None}]
test.topic has offset 10
seek_partitions=[TopicPartition{topic=test.topic,partition=0,offset=7,error=None}]
test.topic seek to 0 failed: KafkaException(KafkaError{code=_STATE,val=-172,str="Failed to seek to offset 7: Local: Erroneous state"})

I am using: confluent_kafka 1.8.2 and running the broker using Docker image confluentinc/cp-enterprise-kafka:6.2.4 (along with the same version of zookeper and schema registry, since my normal code uses Avro schemas).

assigning partitions right after calling subscribe seems to help a bit: seek then succeeds, but the code still does not read the historical data (poll keeps returning None) and consumer.position still returns unknown even after calling consumer.poll — Russell Owen, Aug 23 '22 at 19:31
`-1001` is `OFFSET_INVALID` https://github.com/edenhill/librdkafka/blob/master/src/rdkafka.h#L3498 — Mike Atlas, Sep 20 '22 at 21:38

Russell Owen · Accepted Answer · 2022-09-30T18:05:02.130

From https://github.com/confluentinc/confluent-kafka-python/issues/11#issuecomment-230089107 it appears that one solution is to specify an on_assign callback to Consumer.subscribe, then call Consumer.assign inside the on_assign callback, e.g.:

def on_assign_callback(
    consumer,
    partitions,
):
    """Modify assigned partitions to read up to MAX_HISTORY old messages"""
    for partition in partitions:
        min_offset, max_offset = consumer.get_watermark_offsets(partition)
        desired_offset = max_offset - MAX_HISTORY
        if desired_offset <= min_offset:
            desired_offset = OFFSET_BEGINNING
        partition.offset = desired_offset
    consumer.assign(partitions)

Subtleties:

The callback must assign all topic partitions, even if you don't want historical data for some of the topics.
Construct the consumer with option "auto.offset.reset": "earliest". That way if the broker discards data while the on_assign callback is running, deleting the data at the specified offset, the consumer will read from the beginning.

I'm trying to use this in a one-time group offset reset script, only intended to prep for other consumers. The consumer in this script is assigned all partitions, but it looks like only 1 of the partitions resets its offset at at time. My process is subscribe, poll for a message, check assignments, close. Will this callback happen on all partitions assigned? — user4446237, Apr 04 '23 at 14:47

Mike Atlas · Answer 2 · 2022-09-20T21:44:30.860

I found your post because I was having similar challenges, and have a solution that works for me. This is not based on watermark, but on the committed offset:

consumer.subscribe([topic_name])
messages = []
seeked = False
while True:
    msg = consumer.poll(5)
    tps_comm = consumer.committed(consumer.assignment())
    if len(tps_comm) == 0:
        continue
    else:
        tp = tps_comm[0]
        if tp.offset == OFFSET_INVALID and not seeked:
            tp.offset = OFFSET_BEGINNING
            consumer.seek(tp)
            seeked = True
    if msg is None:
        continue
    elif msg.error():
        raise Exception(msg.error())
    else:
        print(f"got message at offset: {msg.offset()}")
        messages.append(msg)

I've omitted the max_messages and loop timeout logic from my real solution in favor of the simpler code example shared above that lacks any break out of the loop.

What I've gathered is that when the consumer connects to the broker and subscribes to a topic, it doesn't get assigned a topic partition immediately, and not even quickly if your poll call is too short a timeout. In testing, a few seconds might be enough to get it on the first try. But, by trying until the topic partition assignment comes back as a non-empty list, and then checking the committed offset for the group partition assignment, my consumer can decide to seek to the beginning of the topic partition if needed, otherwise, the normal case is that poll will start returning any new uncommitted messages for the group topic partition assignment.

Since my consumer needs to do other things with a message before committing it, I have "enable.auto.commit": False as a consumer configuration setting. Here is the disjoint code that receives the messages and commits their offsets after processing:

tp_offsets = []
for msg in messages:
    tp = TopicPartition(
        topic=msg.topic(),
        partition=msg.partition(),
        offset=msg.offset() + 1,
    )
    tp_offsets.append(tp)
consumer.commit(offsets=tp_offsets)

Note: the code above might need some re-work if you're subscribing to multiple topics.

I find it interesting that your code only seeks back if the position is OFFSET_INVALID. My desire is to seek back a specific number of messages (typically 1). I need valid offsets for that. I tried your code and only ever saw OFFSET_INVALID. So far I just can't get `consumer.committed(...)` to return valid offsets. So far I prefer my solution, since it returns real offsets. Is there a technical reason yours is better? I'm no Kafka expert. — Russell Owen, Sep 27 '22 at 22:55
In my experimentation, the consumer needs to subscribe *and poll* in order to be assigned a partition; this does not happen synchronously. After a short time passes, the broker will have assigned the consumer a partition; the request to get committed offset for my group id returns a valid value. This sort of makes sense when you have your `on_assign` callback doing something like seeking to an offset. If you don't make a `poll` call first, the callback won't be invoked because it has yet to be assigned a partition. At time of writing this post and comment: I'm no expert on Kafka either. — Mike Atlas, Sep 29 '22 at 15:21
I did more experiments and found after each call to Consumer.poll: (a) Consumer.committed always returns offset=-1001. (b) Consumer.position returns a known offset, but only after poll first returns data for that topic. I expected known offsets once partitions were assigned. Clearly lots to learn. — Russell Owen, Sep 30 '22 at 16:47
Seems like the semantics here are that there's always an expected stream of new things to poll on. In my case the stream can sometimes be sparse and no new messages. Either way, I've encapsulated the logic from the app itself and can keep moving along. — Mike Atlas, Oct 03 '22 at 13:12

confluent_kafka: how to reliably seek before reading data (avoiding Erroneous state)

2 Answers2