0

Note: self-answered question, because Google didn't shed any light on the problem.

I have configured a Managed Streaming for Kafka target for AWS Data Migration Service, but the migration job fails. Looking at the logs, I see this:

2021-11-17T18:45:21 kafka_send_record  (kafka_records.c:88)
2021-11-17T18:50:21 Message delivery failed with Error:[Local: Message timed out] [1026800]  (kafka_records.c:16)

I have verified the following:

  • Both DMS replication instance and MSK cluster use the same security group, with a "self ingress" rule that allows all traffic, and an egress rule that allows all traffic.
  • The endpoint connection test succeeds.
  • I can send a message to the MSK topic using the Kafka console producer from an EC2 instance in the same VPC (and receive this message with the console consumer).
  • The DMS job succeeds if I change the endpoint to use a self-managed Kafka cluster, running on an EC2 instance in the same VPC.
kdgregory
  • 38,754
  • 10
  • 77
  • 102

1 Answers1

1

It turned out that the problem was that I pre-created the topic, with a replication factor of 1, but the default MSK configuration specifies min.insync.replicas of 2, which is applied to all created topics.

When DMS sends a message, it requires acks from all in-sync replicas (I'm inferring this, as it's not open-source). This will never succeed if the minimum number of in-sync replicas exceeds the number of actual replicas.

The Kafka console producer, however, defaults to a single ack. This means that it's not a great verification for MSK cluster usability.

Semi-related: the MSK default default.replication.factor value is 3, which means that you over-replicate for a 2-node MSK cluster.

kdgregory
  • 38,754
  • 10
  • 77
  • 102