1

I am trying to setup CDC based replication for different databases (Amazon Postgres RDS) using multiple Debezium Kafka Connectors (using MSK Connector). With addition of more connectors, the replication is creating issues. The Oldest Replica Slot Lag increases across all RDS. This in turn increases the Transaction Log Disk Usage and the overall space occupied by database. The replication slot still shows active in pg_replication_slots.

This stops suddenly if I shut down one of the replication.

The MSK metrics looks stable, the disk usage of brokers are below 50%.

Can someone help what to check here and what might be the issue. Is it related to scaling of some particular component.

I tried adjusting parameters like producer.buffer.memory, offset.flush.timeout.ms,heartbeat.interval.ms, offset.flush.interval.ms :

connector.class=io.debezium.connector.postgresql.PostgresConnector
max.queue.size=2048
key.converter.apicurio.registry.url=<apicurio url>
value.converter.apicurio.registry.url=<apicurio url>
incremental.snapshot.chunk.size=1000
slot.name=slot1
tasks.max=1
value.converter.apicurio.registry.auto-register=true
topic.prefix=cdc
database.sslmode=disable
signal.data.collection=public.debezium_signal
value.converter=io.apicurio.registry.utils.converter.AvroConverter
key.converter.apicurio.registry.auto-register=true
key.converter=io.apicurio.registry.utils.converter.AvroConverter
key.converter.apicurio.registry.find-latest=true
database.user=<db_user>
database.dbname=<dbname>
producer.buffer.memory=16000
offset.flush.timeout.ms=60000
heartbeat.interval.ms=1000
plugin.name=pgoutput
database.port=5432
value.converter.apicurio.registry.find-latest=true
offset.flush.interval.ms=30000
database.hostname=<db url>
auto.register.schema=true
database.password=<db password>
schema.name.adjustment.mode=avro
table.include.list=public.test_table1
max.batch.size=1024
snapshot.mode=never

The logs have messages like these where flush fails for very few messages

[Worker-0f0f9efcac7d7f830] [2023-06-07 09:57:22,538] ERROR [cdc-debezium-be-08|task-0|offsets] WorkerSourceTask{id=cdc-debezium-be-08-0} Failed to flush, timed out while waiting for producer to flush outstanding 3 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:509)

0 Answers0