1
[1] 2022-01-18 21:56:10,280 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #9 - KafkaProducer[test]) Failed delivery for (MessageId: 95835510BC9E9B2-0000000000134315 on ExchangeId: 95835510BC9E9B2-0000000000134315). Exhausted after delivery attempt: 1 caught: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
[1]
[1] Message History (complete message history is disabled)
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] RouteId              ProcessorId          Processor                                                                        Elapsed (ms)
[1] [route1            ] [route1            ] [from[netty://udp://0.0.0.0:8080?receiveBufferSize=65536&sync=false]           ] [    125320]
[1]     ...
[1] [route1            ] [to1               ] [kafka:test?brokers=10.99.155.100:9092&producerBatchSize=0                     ] [         0]
[1]
[1] Stacktrace
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] : org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation

Here's the flow for my project

  1. External Service ---> Netty
  2. Netty ---> Kafka(consumer)
  3. Kafka(producer) ---> processing events

1 and 2 are running in one Kubernetes pod and 3 is running in a separate pod.

I have encountered TimeoutException at the beginning saying like:

org.apache.kafka.common.errors.TimeoutException: Expiring 20 record(s) for test-0:121924 ms has passed since batch creation

I searched online and found a couple of potential solutions Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time

Based on the suggestion, I have done:

  1. make the timeout bigger, double the default value
  2. make the batch size to 0, which will not send events in batch and keeps the memory usage low.

Unfortunately I still encounter the error due to memory is used up.

Does anyone know how to solve it? Thanks!

Haifeng Zhang
  • 30,077
  • 19
  • 81
  • 125
  • What do you mean "memory is used up"? – OneCricketeer Jan 19 '22 at 04:29
  • @OneCricketeer Netty producer receives too faster than consumer consumes and it will accumulate all the unprocessed received events in memory and it will crash the server/pod. – Haifeng Zhang Jan 26 '22 at 15:46
  • Netty doesn't send data to a Kafka consumer process, though. – OneCricketeer Jan 26 '22 at 15:58
  • How do you use Netty with Kafka(consumer)? Do you write all data received from Netty into Kafka topic? Kafka(producer) does such. ANd what do you mean saying "Kafka(producer) ---> processing events"? Do you consume the data from Kafka to process it further? – bvn13 Jan 29 '22 at 18:58
  • As I understand you have an issue because Netty produces the data much faster than Kafka producer may write into Kafka topic. Did you try Throttling EIP? https://camel.apache.org/components/3.14.x/eips/throttle-eip.html – bvn13 Jan 29 '22 at 19:07

1 Answers1

1

There are several things to take into account here. You are not showing up what your throughput is, you have to take into account that value and if your broker on 10.99.155.100:9092 is able to process such load. Did you check 10.99.155.100 during the time of the transfer? The fact that Kafka can potentially process hundreds of thousands of messages per second doesn't mean that you can do it on any hardware.

So, having said that, the timeout is the first to come to my mind, but in your case you have 2 minutes and still you are timing out, for me, this sounds more like a problem in your broker and not on your producer.

To understand the issue, basically, you are getting your mouth full faster than you can swallow, by the time push a message the broker is not able to acknowledge on time (in this case, 2 minutes)

What things you can do here:

  • Check the broker performance for the given load Change your delivery.timeout.ms to an acceptable value, I guess you have SLAs to attach to Increase your retry backoff timer (retry.backoff.ms) Do not put the batch size as 0, this will try a live push to the broker, which in case seems not possible for the load. Make sure your max.block.ms is set correctly Change to bigger batches (even if this increases latency), but not too big, you need to sit down, check how many records you are pushing and allocate them correctly.

Now, some rules:

  • delivery.timeout.ms must be bigger than the sum of request.timeout.ms and linger.ms All the above are impacted by the batch.size If you don't have so many rows, but those rows are huge! then control the max.request.size

So, to summarize, your properties to change are the following:

delivery.timeout.ms, request.timeout.ms, linger.mx, max.request.size

Assuming the hardware is good and also assuming that you are not sending more than you should, those should do the trick

Marco
  • 1,172
  • 9
  • 24