1

I am using the confluent-kafka Python library to read from kafka. I am using the following consumer settings

Consumer ={
"bootstrap.servers" : kafka_server,
"group_id" : "testing",
"auto.offset.reset" : "latest"}

My goal is to ensure that I am always reading the latest messages in kafka. The above works as long as the program keeps running. But if the program crashes for some reason it starts reading from the message it last consumed instead of the last message in the topic.

I don't mind loosing a few messages but it is absolutely necessary that I am always reading the latest messages. Looks the Consumer remembers the offset and starts from it instead of from the latest one.

I tried setting the enable.auto.commit parameter to False but I get the same results.

James Z
  • 12,209
  • 10
  • 24
  • 44

2 Answers2

0

enable.auto.commit should be true, if you want to achieve this case.

Since you have enable.auto.commit='false', this means that's your code (consumer)responsibility to commit the offset. in case of crash, it may not gurantee to commit the offset, which is causing your application to start from last consume message.

The configuration 'latest' doesn't mean that consumer will skip the message and process the latest message.

kus
  • 446
  • 3
  • 7
  • How will committing the offset help? Won't the offset committed be of the last read message? In which case it will be the same? – PRATIK CHAPADGAONKAR Apr 22 '21 at 17:44
  • Please refer the abstract from kafka documentation: The committed position is the last offset that has been stored securely. Should the process fail and restart, this is the offset that the consumer will recover to. The consumer can either automatically commit offsets periodically; or it can choose to control this committed position manually by calling one of the commit APIs (e.g. commitSync and commitAsync) – kus Apr 23 '21 at 01:57
  • I tried your suggestion and it seems to be working. Thanks. I read the kafka documentation and to me it seems to imply the opposite of what you said. I might be wrong. Could you elaborate and maybe try to illustrate how? – PRATIK CHAPADGAONKAR Apr 23 '21 at 13:02
  • I thought it was working. But apparently it is not. – PRATIK CHAPADGAONKAR Apr 23 '21 at 14:15
-1

If you want to read the messages from latest always use the unique group_id for consumer always and make sure auto.offset.reset is latest.

you can use uuid for generating random id always

 Consumer ={ "bootstrap.servers" : kafka_server, "group_id" : uuid.uuid4(), "auto.offset.reset" : "latest"}
Ryuzaki L
  • 37,302
  • 12
  • 68
  • 98
  • I know that you can achieve that with a unique group id. However I was planning to run multiple instances of the program in parallel and using a statically defined group id is necessary to avoid replication – PRATIK CHAPADGAONKAR Apr 22 '21 at 17:41