0

I want to copy all messages from a topic in Kafka cluster. So I ran Kafka Mirrormaker however it seems to have copied roughly only half of the messages from the source cluster (I checked that there's no consumer lag in source topic). I have 2 brokers in the source cluster does this have anything to do with this?

This is the source cluster config:

log.retention.ms=1814400000
transaction.state.log.replication.factor=2
offsets.topic.replication.factor=2
auto.create.topics.enable=true
default.replication.factor=2
min.insync.replicas=1
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000

The source topic has 4 partitions and is not compacted. The Mirrormaker config is:

  • mirrormaker-consumer.properties
bootstrap.servers=broker1:9092,broker2:9092
group.id=picturesGroup3
auto.offset.reset=earliest
  • mirrormaker-producer.properties
bootstrap.servers=localhost:9092
max.in.flight.requests.per.connection=1
retries=2000000000
acks=all
max.block.ms=2000000000

Below are the stats from Kafdrop on the source cluster topic:

Partition First Offset Last Offset Size Leader Node Replica Nodes In-sync Replica Nodes Offline Replica Nodes Preferred Leader Under-replicated
0 13659 17768 4109 1 1 1 Yes No
1 13518 17713 4195 2 2 2 Yes No
2 13664 17913 4249 1 1 1 Yes No
3 13911 18072 4161 2 2 2 Yes No

and these are the stats for the target topic after Mirrormaker run:

Partition First Offset Last Offset Size Leader Node Replica Nodes In-sync Replica Nodes Offline Replica Nodes Preferred Leader Under-replicated
0 2132 4121 1989 1 1 1 Yes No
1 2307 4217 1910 1 1 1 Yes No
2 2379 4294 1915 1 1 1 Yes No
3 2218 4083 1865 1 1 1 Yes No

As you can see roughly only half of the source messages are in the target topic based on size column. What am I doing wrong?

hitchhiker
  • 1,099
  • 5
  • 19
  • 44
  • Why aren't you using MirrorMaker2? – OneCricketeer Jan 09 '22 at 13:10
  • @OneCricketeer I'm using the mirrormaker from this image https://hub.docker.com/r/confluentinc/cp-kafka/ so the kafka version according to `./kafka-mirror-maker --version` is `7.0.0-ccs (Commit:c6d7e3013b411760)`. – hitchhiker Jan 09 '22 at 13:34
  • 1
    Don't use the broker image, and the MirrorMaker1 scripts are essentially deprecated. `cp-kafka-connect` image contains MirrorMaker2 https://github.com/apache/kafka/tree/trunk/connect/mirror#mirrormaker-20 – OneCricketeer Jan 09 '22 at 13:37
  • A little bit late, are you using a transactional producer? In that case, transaction marks are not replicated as messages (they are not messages per se) but they take an space in source topic. – rgo Mar 24 '22 at 16:17

1 Answers1

0

I realized that the issue happened because I was copying data from a cluster with 2 brokers to a cluster with 1 broker. So I assume Mirrormaker1 just copied data from one broker from original cluster. When I configured the target cluster to have 2 brokers all of the messages were copied to it.


Regarding the advice of @OneCricketeer to use Mirrormaker2 this also worked however it took me a while to get to correct configuration file:

clusters = source, dest

source.bootstrap.servers = sourcebroker1:9092,sourcebroker2:9092
dest.bootstrap.servers = destbroker1:9091,destbroker2:9092
topics = .*
groups = mm2topic
source->dest.enabled = true
offsets.topic.replication.factor=1
offset.storage.replication.factor=1
auto.offset.reset=latest

In addition Mirrormaker2 can be found in connect container in this KafkaConnect project (enter the container and in the /kafka/bin directory there will be connect-mirror-maker.sh executable).

A major downside with Mirrormaker2 solution is it will add a prefix to the topics in target cluster (in my case new names would require changing application code). The prefix can't be changed in Mirrormaker2 configuration so the only way is to implement a custom Java class as explained here.

hitchhiker
  • 1,099
  • 5
  • 19
  • 44
  • 1
    From Kafka Connect cluster 7.x, you can use the replication policy org.apache.kafka.connect.mirror.IdentityReplicationPolicy to avoid the prefix. Otherwise you can implement the prefixless policy following the indications from the answer with more upvotes https://stackoverflow.com/questions/59390555/is-it-possible-to-replicate-kafka-topics-without-alias-prefix-with-mirrormaker2 – rgo Mar 24 '22 at 16:25