Kafka use seek to manually manage offset, consumer lose some messages

Question

I have a script to test for at least one consume

the producer

import json
import random
import time

from confluent_kafka import Producer
import config

p = Producer({'bootstrap.servers':','.join(config.KAFKA_HOST),})
total_count = 0
c = 0
try:
    for i in range(20000):
        num = random.randint(1, 1000000)
        total_count += num
        a = {'t': num, 'time': time.time()}
        p.produce('test-topic-vv', json.dumps(a))
        c += 1
        if c %100 == 0:
            p.flush()
finally:
    p.flush()

the consumer

import json
import random
import sys

from confluent_kafka import Consumer, TopicPartition
import config
c = Consumer({
    'bootstrap.servers':','.join(config.KAFKA_HOST),
    'group.id': 'test-topic-consumer-group',
    'auto.offset.reset': 'earliest',
    'enable.auto.offset.store': False,
    'enable.auto.commit': True,
})
topic = 'test-topic-vv'

def test_for_seek():
    try:
        pp = []
        pp.append(TopicPartition(topic, partition=8))
        c.assign(pp)
        while True:
            msgs = c.consume(num_messages=10, timeout=10)
            if not msgs:
                print('no data and wait')
                for i in c.assignment():
                    print(i.topic, i.partition, i.offset, c.get_watermark_offsets(i))
                continue
            for msg in msgs:
                t1 = msg.partition()
                o1 = msg.offset()
                print('Received message: {} par {} offset {}'.format(msg.value().decode('utf-8'), t1, o1))
            break
    finally:
        c.close()

def test_for_run():
    try:
        c.subscribe([topic])
        total_count = 0
        map_par = {}
        while True:
            msgs = c.consume(num_messages=10, timeout=5)
            if not msgs:
                print('no data and wait')
                for i in c.assignment():
                    print(i.topic, i.partition, i.offset, c.get_watermark_offsets(i))
                continue
            deald = []
            for msg in msgs:
                t1 = msg.partition()
                o1 = msg.offset()
                print('Received message: {} par {} offset {}'.format(msg.value().decode('utf-8'), t1, o1))
                if random.randint(1, 100) == 9:
                    # test for deal failed then retry again
                    print('deal failed will retry msg offset {} partition {}'.format(msg.offset(), msg.partition()))
                    break
                else:
                    total_count += json.loads(msg.value())['t']
                    # test for deal success
                    if t1 in map_par:
                        if map_par[t1] + 1 != o1:
                            raise Exception('deal partition {} except last offset {} current offset {}'.format(t1, map_par[t1], o1))
                    map_par[t1] = o1
                    c.store_offsets(msg)
                    deald.append(msg)
            group_partition = {}
            for msg in msgs:
                if msg in deald:
                    continue
                partition = msg.partition()
                offset = msg.offset()
                if partition in group_partition:
                    group_partition[partition] = min(group_partition[partition], offset)
                else:
                    group_partition[partition] = offset
            # seek to deal failed partition offset
            for k, v in group_partition.items():
                c.seek(TopicPartition(topic, partition=k, offset=v))
                print('deal failed will set msg offset {} partition {}'.format(v, k))
    finally:
        c.close()

if sys.argv[1] == 'test_for_seek':
    test_for_seek()
else:
    test_for_run()

the topic test-topic-vv has 9 partition

first i run producer to add some message to topic then consume it. but i got a exception

screenshot https://user-images.githubusercontent.com/12459874/194990350-8cd13128-f3fa-4a86-a93e-771af45f93f0.png

The latest message's offset of partition 8 should be 7382 but got 7391

then i run test_for_seek to check the consumer group's actually record offset it was 7382 indeed

screenshot https://user-images.githubusercontent.com/12459874/194990593-9b8431d0-ce07-4122-800d-f9b3c129f5f3.png

I also check the broker's group offset record

screenshot https://user-images.githubusercontent.com/12459874/194990684-9d8ad773-a569-4cee-9d4c-0a898e8f8922.png

it also was 7382

So what happened to consumer when use seek to manage offset, hope any one can help me to deal with the problem.

check information

confluent_kafka.version()==1.9.2
confluent_kafka.libversion()==1.9.2
Operating system: ubuntu 16.04
Python3.8
kafka 2.11-1.1.1

Can you please explain what your consumer is trying to actually do? More specifically, if you want to manually manage offsets, you'll want to disable auto commits — OneCricketeer, Oct 11 '22 at 14:49
@OneCricketeer I have disabled enable.auto.offset.store which will not change offsets only when i call store_offsets. So the offsets should be only changed when the msg be processed success. — dakang, Oct 12 '22 at 01:14
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#at-least-once-processing you can check the different with this article — dakang, Oct 12 '22 at 04:24
Okay, so I think you should try pausing your consumer before you seek it. Otherwise, you're looping over the same batch of records twice only to build some intermediate data structures which don't seem necessary. I cannot tell if you're trying to seek past a bad record, or commit 10-N offsets, where 0 <= N <= 10, then poll. If so, you should commit the offset before the failed event, not use seek, then polling again will retry the earliest failed offset from the previous batch. — OneCricketeer, Oct 12 '22 at 13:11
This will also give you at least once processing, not exactly once, because your lists and dictionaries only live in memory, so if your app restarts, those lists/dicts are cleared, and you'll be unable to know which events after the committed/seeked offset had been processed already — OneCricketeer, Oct 12 '22 at 13:15
@OneCricketeer The consumer group would change the partition's current position when it poll msg, and keep the max offset been fetched as new position even if i'm not commit the offset. So it would skip the failed msg when polling again, that's why i used seek to location the missed message. https://stackoverflow.com/questions/47543771/kafkaconsumer-position-vs-committed/47548737#47548737 this link explain the question. — dakang, Oct 13 '22 at 04:40
No, the group offset is only changed when committed, not polled. You're not required to use groups to poll data... I understand fine what seek/commit do. I don't understand why you are wanting to track offsets in an in-memory data structure when you could instead use a dead-letter topic, for example — OneCricketeer, Oct 13 '22 at 13:29

Kafka use seek to manually manage offset, consumer lose some messages

0 Answers0