52

producer sends messages 1, 2, 3, 4

consumer receives messages 1, 2, 3, 4

consumer crashes/disconnects

producer sends messages 5, 6, 7

consumer comes back up and should receive messages starting from 5 instead of 7

For this kind of result, which offset value I have to use and what are the other changes/configurations need to do

Sat
  • 3,520
  • 9
  • 39
  • 66

2 Answers2

79

When a consumer joins a consumer group it will fetch the last committed offset so it will restart to read from 5, 6, 7 if before crashing it committed the latest offset (so 4). The earliest and latest values for the auto.offset.reset property is used when a consumer starts but there is no committed offset for the assigned partition. In this case you can chose if you want to re-read all the messages from the beginning (earliest) or just after the last one (latest).

ppatierno
  • 9,431
  • 1
  • 30
  • 45
  • `Producer` sending messages continuously... I checked the offset value before stopping the `consumer`, it was 8023. after 10mins I started `consumer` then the first offset value is 8020. After some time again I stopped consumer at that time offset value is `9239` after an hour I started `consumer` then the first message offset value is `9299` I am setting a `groupId` and `auto.offset.reset` is `latest` I am also logging the `partition` value , it is `0` only – Sat Jan 19 '18 at 09:39
  • 1
    So if you set it to latest, it will read 7. After it's commited 7, will it then read 6 and 5? Or is there a scenario in which they won't get processed if there is a constant stream of new records coming in with higher priority? – Yoker Jul 09 '19 at 20:16
  • 1
    when you commit an offset, it means that you read all the previous messages. So committing 7 means that next you won't read 6 and 5 but the new incoming message 8 sent by the producer. – ppatierno Jul 10 '19 at 05:44
  • 1
    I think @ppatierno not answer the question. For the question from Sat: the value for auto.offset.reset should be latest. When auto.offset.reset is set to latest, there are 2 scenarios can happen: first time when the consumer subscribe to topic, it will only receive the message arrive after it subscribed. Other scenario is when the consumer reconnect to the topic(after get crashed or something), consumer will receive the message 5, 6, 7 because the latest commit was 4. – Esca Tran Jun 24 '20 at 07:46
  • For @Yoker's question: the sequence of the message is immutable. The consumer will receive the message in this sequence: 5, 6, 7 – Esca Tran Jun 24 '20 at 07:49
  • 1
    @EscaTran The answer is correct. https://docs.confluent.io/current/clients/consumer.html#:~:text=Second%2C%20use%20auto.,%E2%80%9D%20offset%20(the%20default). "After the consumer receives its assignment from the coordinator, it must determine the initial position for each assigned partition. When the group is first created, before any messages have been consumed, the position is set according to a configurable offset reset policy (auto.offset.reset). Typically, consumption starts either at the earliest offset or the latest offset." – Kumar Sambhav Jul 16 '20 at 13:40
-1

To get a clear idea about this scenario we need to understand what happens when a consumer joins the same consumer group.

  1. Join the consumer group which triggers rebalance and assigns partitions to the new consumer.
  2. Look for committed offsets of the partitions assigned to the consumer.
  3. Check the auto.offset.reset configuration parameter to decide where to start consuming messages from.

We can set two values for auto.offset.reset configuration.

i. earliest - start consuming from the point where it stopped consuming before. (According to your example starts from 5)

ii. latest - starts consuming from the latest offsets in the assigned partitions. (According to your example starts from 7)