5

I am trying to replicate Kafka cluster with MirrorMaker 2.0. I am using following mm2.properties:

name = mirror-site1-site2
topics = .*
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasks.max = 1
plugin.path=/usr/share/java/kafka/plugin
clusters = site1, site2

# for demo, source and target clusters are the same
source.cluster.alias = site1
target.cluster.alias = site2

site1.sasl.mechanism=SCRAM-SHA-256
site1.security.protocol=SASL_PLAINTEXT
site1.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
   username="<someuser>" \
   password="<somepass>";

site2.sasl.mechanism=SCRAM-SHA-256
site2.security.protocol=SASL_PLAINTEXT
site2.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
   username="<someuser>" \
   password="<somepass>";

site1.bootstrap.servers = <IP1>:9093, <IP2>:9093, <IP3>:9093, <IP4>:9093
site2.bootstrap.servers = <IP5>:9093, <IP6>:9093, <IP7>:9093, <IP8>:9093

site1->site2.enabled = true
site1->site2.topics = topic1


# use ByteArrayConverter to ensure that records are not re-encoded
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter

So here's the issue, mm2 seems to allways replicate x3 messages :

# Manual message production: 

 kafkacat -P -b <IP1>:9093,<IP2>:9093,<IP3>:9093,<IP4>:9093 -t "topic1"


# Result in the source topic (site1 cluster): 

% Reached end of topic topic1 [2] at offset 405
Message1
% Reached end of topic topic1 [2] at offset 406
Message2
% Reached end of topic topic1 [6] at offset 408
Message3
% Reached end of topic topic1 [2] at offset 407

 kafkacat -P -b <IP5>:9093,<IP6>:9093,<IP7>:9093,<IP8>:9093 -t "site1.topic1"

# Result in the target topic (site2 cluster): 

% Reached end of topic site1.titi [2] at offset 1216
Message1
Message1
Message1
% Reached end of topic site1.titi [2] at offset 1219
Message2
Message2
Message2
% Reached end of topic site1.titi [6] at offset 1229
Message3
Message3
Message3

I tried using Kafka from confluent package and kafka_2.13-2.4.0 directly from Apache, both with Debian 10.1.

I first encouraged this behaviour with confluent 5.4, thought it could be a bug in their package as they have replicator and should not really care about mm2, but I reproduced exactly the same issue with kafka_2.13-2.4.0 directly from Apache without any change.

I'm aware that mm2 is not yet idempotent and can't guarantee once delivery. In my tests (I tried many things including producer tuning or bigger batch of thousand messages). In all these test mm2 always duplicate X3 all messages.

Did I miss something, did someone encourage the same thing ? As a site note with legacy mm1 with the same packages I don't have this issue.

Appreciate any help... Thanks !


Even if the changelog didnt made me very confident about an improvement I tried again to run a mm2, from kafka 2.4.1 this time. => no change allways these strange duplications.

I installed this released on a new server to ensure the strange behaviour I met wasnt something related to the server.

As I use ACL does I need special right ? I put "all" thinking it cant be more permisive... Even if mm2 isnt idempotent yep, I'll give a try to the right related to that.

That suprise me the more is that I cant find anything reporting an issue like this, for sure I must do something wrong, but what that is the question...

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Aurelien
  • 51
  • 1
  • 2

2 Answers2

7

You need to remove connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector from your configuration, because this is telling Mirror Maker to use this class for Heartbeats and Checkpoints connectors that it generates along with the Source connector that replicates data, and this class makes them behave exactly like a Source connector, so that's why you get 3 messages replicated each time, you've actually generated 3 Source connectors.

dippas
  • 58,591
  • 15
  • 114
  • 126
FSAN
  • 71
  • 1
  • 3
  • 2
    This does not provide an answer to the question. You can [search for similar questions](//stackoverflow.com/search), or refer to the related and linked questions on the right-hand side of the page to find an answer. If you have a related but different question, [ask a new question](//stackoverflow.com/questions/ask), and include a link to this one to help provide context. See: [Ask questions, get answers, no distractions](//stackoverflow.com/tour) – dippas Jun 09 '20 at 13:39
  • 1
    I know, I would've commented rather than answered, but I can't comment because I don't have enough rep yet, and if I create a new question it will probably be flagged as a duplicate and I think it would be better to give this existing question more visibility. I already tried to find similar questions but failed. Also, I read the guide to giving good answers and there is something in it about adding information even if you can't answer the question, and I think what I said provides some new information, so I thought this was the only way for me to share it with the OP. – FSAN Jun 09 '20 at 13:49
  • You're a life saver! The KIP-545 docs on how to configure MirrorMaker 2.0 in each of the three different running modes is confusing, to say the least. How can a single config entry lead to such a weird behaviour? This should at least be rejected by the config validation logic. Anyway, thank you so much! – jurgispods Aug 14 '20 at 13:20
  • I can confirm that removing that line was the solution. – Yaya Mar 03 '21 at 01:58
1

Enabling idempotence to the client config will fix the issue. By default it will be set to false. Add the below to the mm2.properties file

source.cluster.producer.enable.idempotence = true
target.cluster.producer.enable.idempotence = true