0

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.

Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).

While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.

Any hints what might be the cause of duplication? What should I be looking for to figure this out?

We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).

Alex Glikson
  • 401
  • 4
  • 13

1 Answers1

1

You should show your configuration when asking questions like this.

Best guess is the broker's offsets.retention.minutes.

With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks a lot, @gary-russell! Indeed, this parameter is not set, and it looks like the offsets are being reset and messages are re-consumed. However, the consumer group *is* running without long interruptions. Any idea what might happen that would cause the brokers to trigger the cleanup anyway? – Alex Glikson May 21 '20 at 06:30
  • 1
    See the looooong discussionn on [this answer](https://stackoverflow.com/questions/61913232/kafka-fails-to-keep-track-of-last-commited-offset/61913961#61913961). It turns out that with older brokers, the `offsets.retention.minutes` is simply the time since the last commit and the offsets are reset even if the consumer is only stopped briefly (if it has not received any records in the last day). With newer brokers, the consumer has to be offline for the `offsets.retention.minutes` for them to be removed. I don't know when it changed (but the property description changed in 2.1) – Gary Russell May 21 '20 at 13:36
  • 1
    It's actually worse than I thought; with a 1.0 broker, the offsets are removed even if the consumer(s) are still up. Not really a problem since the `position()` is still retained. However, if you have, say, two app instances sharing the partitions and the offset retention time elapses with no commits, the offsets are removed. If you then bring down one instance, a rebalance occurs and the records are all replayed on the remaining instance. This is not a problem with current brokers (tested with 2.4.1 and 2.5) because they are not removed until the consumers have been down that long. – Gary Russell May 21 '20 at 18:27
  • Thanks a lot! This is really helpful. We will definitely consider upgrading to a newer version of the brokers. Meanwhile, hopefully increasing the offsets retention period will help. – Alex Glikson May 22 '20 at 06:40