What are the settings that affect the number of TCP connections made to kafka? Background is that MSK IAM has a throttle limit.
Some things i can think of:
- max.tasks
- number of partitions
- number of brokers
- replication factor
What are the settings that affect the number of TCP connections made to kafka? Background is that MSK IAM has a throttle limit.
Some things i can think of:
There's no specific number.
For a rough estimate, from the Connect API, tasks.max
is the only one above that is configurable that matters. Each task would start a set of consumer/producer instances, which only communicate with the leader partition.
Internally to the framework, there's data being produced and consumed between the Connect status, offsets, and config topics. By default, few of those have up to 50 partitions, meaning one connection for each.
After data reaches the leader partition, then it's replicated, per your factor, within the cluster (still over TCP).
Some source connectors may additionally create an AdminClient connection in order to create topics ahead of the writing the data.
Other connectors may use multiple topics for errors.tolerance
dead-letter-queue, or more specific ones like confluent.license.topic
, or Debezium's database history topic, or MirrorMaker2 heartbeat topic.
If you're using Confluent Schema Registry, then that also maintains a _schemas
topic.
Then finally, Sink consumers will be writing to __consumer_offsets
topic.
For some of these, increasing internal client configs, such as consumer max.poll.records
or producer batch.size
, will reduce the frequency of connections made, at the expense of potentially dropping/duplicating data during errors/rebalance
In my case we were seeing an error like below. It looks like we were getting a SASL token back from MSK, but getting throttled by EC2 instance metadata service retrieving AWS credentials to evaluate it. It turns out this is not retriable from the reconnect.backoff.max.ms
and reconnect.backoff.ms
logic of the Kafka client (https://kafka.apache.org/documentation/#producerconfigs_reconnect.backoff.ms) that the MSK documentation point you to for retrying because of MSK new TCP connection throttling when IAM authentication is enabled in your MSK cluster (https://docs.aws.amazon.com/msk/latest/developerguide/limits.html#msk-provisioned-quota)
We are using the Java library aws-msk-iam-auth. I found there is retry and exponetial backoff with jitter logic to account for these transient connectivity error fetching the AWS credentials on the client that requires some config to your JAAS config string.
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required awsMaxRetries="7" awsMaxBackOffTimeMs="500";
https://github.com/aws/aws-msk-iam-auth#retries-while-getting-credentials
org.apache.kafka.common.errors.SaslAuthenticationException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: Failed to find AWS IAM Credentials [Caused by aws_msk_iam_auth_shadow.com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [aws_msk_iam_auth_shadow.com.amazonaws.auth.AWSCredentialsProviderChain@1d00a730: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, software.amazon.msk.auth.iam.internals.EnhancedProfileCredentialsProvider@6dff3234: Profile file contained no credentials for profile 'default': ProfileFile(profiles=[]), aws_msk_iam_auth_shadow.com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@7aaa946f: Too Many Requests (Service: null; Status Code: 429; Error Code: null; Request ID: null; Proxy: null)]]]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.
I'm not clear if this is exactly what prompted the original question, but it brought me here and many other dead ends. Hopefully this helps someone else as the MSK documentation was only mentioning the Kafka Connect settings that were ineffective in this scenerio, and it took me a lot of time and frustration to discover the settings in the aws-msk-iam-auth library.