0

I'm creating a consumer with confluent-kafka in python, I want to create it in a way that if the consumer is restarted, it starts from the last available message in the topic (per partition), it doesn't matter if it lefts behind messages without commit.

This is to avoid to process millions of messages that were generated while the consumer was down and that are not longer required to be processed.

I tried setting different options of the parameter auto.offset.reset but at most if starts from the last committed offset. This is my configuration:

consumer = Consumer({"bootstrap.servers": "localhost:9092",
                     "group.id": group_id,
                     "auto.offset.reset": "latest",
                     "isolation.level": "read_committed",
                     "default.topic.config": {"enable.auto.commit": False}})

Is there any option to achieve this behavior?

Note: I might have multiple consumers, but none manually assigned to a specific partition

Rodrigo A
  • 657
  • 7
  • 23

1 Answers1

1

The auto.offset.reset configuration is only applied if there are no committed offsets.

If you want to always restart from the end, you can disable auto commit using enable.auto.commit=false (and be sure to not commit explicitly too), and set auto.offset.reset to latest.

Another option is to explicitly seek to the end whenever a partition is assigned to a consumer (with on_assigned()) using a combination of get_watermark_offsets() and seek()

Mickael Maison
  • 25,067
  • 7
  • 71
  • 68
  • Hi, thanks for you answer, this generates me a couple of questions: If auto commit is set to false and I dont commit explicitly, does this mean I can no longer achieve exactly one behavior? Also, if I use the seek and watermak options, I have to explicitly tell the consumer which partition to look, and as I have multiple consumers,I might create dynamically new consumers, so I dont know if this could be a problem – Rodrigo A Jun 17 '21 at 16:57