Problem scenario
we replicated topics from my source Kafka cluster to my target Kafka cluster, this of this as a merge. We have an kafka cluster that we want to move off of and the new cluster which already contains other topics. Initially, the replication was working as expected, but we eventually ran into a failed attempt via mirrormaker2. I've since deleted topics on the destination side expecting that I would be able to restart mirrormaker2 replication and that the replication would work. Unfortunately what happens now for this failed attempt is the topic shows up again on the target side, but the records underneath the topic never replicated. I have now had a few failures with different replications and in each case the same issue occurs regardless of if I delete or level the topic on the target side, the records for that topic never show up. What are my options, a path, to cleaning up a failed migrations and starting again for those specific topics that didn't succeed?
In our current configuration i am using srimzi-kafka operator version 0.29.0 and mirrormaker2 replicator version 3.0.0,
strimzi-kafka-operator configuration
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: strimzi-kafka
namespace: strimzi-kafka
spec:
interval: 5m
url: https://strimzi.io/charts/
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: strimzi-kafka-operator
namespace: strimzi-kafka
spec:
chart:
spec:
chart: strimzi-kafka-operator
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: strimzi-kafka
version: 0.29.0
install:
createNamespace: true
interval: 5m0s
releaseName: strimzi-kafka-operator
Mirrormaker2 replicator configuration
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
name: replicator
namespace: strimzi-kafka
spec:
version: 3.0.0
replicas: 1
logging:
type: inline
loggers:
connect.root.logger.level: "INFO"
resources:
requests:
cpu: 2000m
memory: 8Gi
limits:
memory: 8Gi
connectCluster: "my-target-cluster-d1"
clusters:
- alias: "my-source-cluster"
bootstrapServers: source-broker-0.exapme.com:9093
authentication:
type: scram-sha-256
username: admin
passwordSecret:
secretName: target-cluster-credentials
password: password.txt
tls:
trustedCertificates: []
config:
- alias: "my-target-cluster-d1"
bootstrapServers: destination-broker-0.exampke.com:9093
authentication:
type: scram-sha-256
username: *****
passwordSecret:
secretName: target-cluster-credentials
password: *****
tls:
trustedCertificates: []
config:
config.storage.replication.factor: 3
offset.storage.replication.factor: 3
status.storage.replication.factor: 3
min.insync.replicas: 3
offset.flush.timeout.ms: 10000
mirrors:
- sourceCluster: "my-source-cluster"
targetCluster: "my-target-cluster-d1"
sourceConnector:
tasksMax: 4
config:
producer.override.batch.size: 327680
producer.override.linger.ms: 100
producer.request.timeout.ms: 30000
consumer.fetch.max.bytes: 52428800
consumer.max.partition.fetch.bytes: 1048576
consumer.max.poll.records: 500
auto.offset.reset: earliest
consumer.auto.offset.reset: earliest
my-source-cluster.consumer.auto.offset.reset: earliest
replication.factor: 3
offset-syncs.topic.replication.factor: 3
sync.topic.acls.enabled: "false"
replication.policy.separator: ""
replication.policy.class: "io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy"
heartbeatConnector:
config:
producer.override.request.timeout.ms: 30000
consumer.max.poll.interval.ms: 300000
heartbeats.topic.replication.factor: 3
checkpointConnector:
config:
producer.override.request.timeout.ms: 30000
checkpoints.topic.replication.factor: 3
topicsPattern: "topic1.v1.*" # topic pattern
We deleted failed and some replicated topics from destination cluster and re-apply replication which does not work as expected. We are expecting topics to be replicated with all it's records