Duplicate messages when using kafka mirrormaker at the time of problems on the source cluster

Question

We have a remote kafka cluster that belongs to an external service, with which we pull data using a mirrormaker to our internal kafka cluster. The following situation has occurred - on the side of the external service, one of the cluster brokers has fallen due to technical reasons. The following appeared in the mirrormaker logs:

...
ERROR [Consumer clientId=XXX-1, groupId=YYY] Offset commit failed on partition PARTITION_NAME at offset 123456: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
WARN Failed to commit offsets because the consumer group has rebalanced and assigned partitions to another instance. If you see this regularly, it could indicate that you need to either increase the consumer's session.timeout.ms or reduce the number of records handled on each iteration with max.poll.records (kafka.tools.MirrorMaker$)
...

Next, consumers reconnected to alive nodes in the cluster and continued to read messages. The problem is that due to the fall of the broker on the side of the external kafka, the messages could be read, but could not be committed. For this reason, after the rebalancing, the messages were read again and duplicates appeared in our internal cluster.

Are there any ways that would help in this situation to avoid duplicates in the internal cluster? (except for those indicated in the log warning.)

Maybe there are some consumer configuration parameters that would help to solve problems with duplicates.

Based on the log, you're not using MirrorMaker2? Old MirrorMaker has no exactly once guarantee. Neither does v2, but it's architecturly different — OneCricketeer, Oct 29 '22 at 22:49
Yes, it seems that the first version is used (just in case, tell me, please, how to check the MM version?). Is there such an option in the config in the second version (MM2) to avoid this behavior with duplicates? — Pablinho, Oct 31 '22 at 12:12
I don't think you can externally check versions. But they have different startup scripts, completely. I don't think either can guarantee no duplicates. Refer - https://cwiki.apache.org/confluence/display/KAFKA/KIP-656%3A+MirrorMaker2+Exactly-once+Semantics — OneCricketeer, Oct 31 '22 at 17:41

Duplicate messages when using kafka mirrormaker at the time of problems on the source cluster

0 Answers0