I've got a 9 nodes kafka cluster hosted on AWS MKS and I'm using confluent-kafka library with python.
While producing to a topic I get too many Timeout errors like:
%5|1684850550.061|REQTMOUT|eb3004f09ce7#producer-1| [thrd:sasl_ssl://broker1.a]: sasl_ssl://broker1.amazonaws.com:9096/3: Timed out ProduceRequest in flight (after 1391ms, timeout #0): possibly held back by preceeding ProduceRequest with timeout in 58184ms
%3|1684850430.948|FAIL|f814d85051d8#producer-1| [thrd:sasl_ssl://broker1.a]: sasl_ssl://broker1.amazonaws.com:9096/3: 1 request(s) timed out: disconnect (after 4493ms in state UP)
%4|1684858183.537|REQTMOUT|6521878ea69c#producer-1| [thrd:sasl_ssl://broker1.a]: sasl_ssl://broker1.amazonaws.com:9096/3: Timed out 6 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
My producer config is:
"batch.size": 5_242_880,
"client.id": socket.gethostname(),
"compression.codec": "gzip",
"linger.ms": 40,
"message.max.bytes": 5_242_880,
"batch.num.messages": 1_000_000,
"queue.buffering.max.messages": 10_000_000,
My broker config is :
auto.create.topics.enable = false
group.initial.rebalance.delay.ms = 3
log.retention.ms = 300000
log.segment.bytes = 1073741824
message.max.bytes = 10485760
min.insync.replicas = 2
num.io.threads = 32
num.network.threads = 1500
num.recovery.threads.per.data.dir = 8
num.replica.fetchers = 2
offsets.retention.minutes = 180
offsets.topic.replication.factor = 3
replica.fetch.max.bytes = 10485760
replica.fetch.response.max.bytes = 10485760
replica.socket.receive.buffer.bytes = 10485760
socket.receive.buffer.bytes = 10485760
socket.request.max.bytes = 104857600
socket.send.buffer.bytes = 10485760
transaction.state.log.min.isr = 1
transaction.state.log.replication.factor = 3
unclean.leader.election.enable = true
zookeeper.connection.timeout.ms = 300000
confluent-kafka version: 2.1.1 (latest)
May you suggest me what settings I should adjust to avoid the problem? Could it be a back-pressure problem? (I'm dealing with a large amount of data per second)
I tried to adjust the above parameters without any results.