I want my consumers to process large batches, so I aim to have the consumer listener "awake", say, on 1800mb of data or every 5min, whichever comes first.
Mine is a kafka-springboot application, the topic has 28 partitions, and this is the configuration I explicitly change:
Parameter | Value I set | Default Value | Why I set it this way |
---|---|---|---|
fetch.max.bytes | 1801mb | 50mb | fetch.min.bytes+1mb |
fetch.min.bytes | 1800mb | 1b | desired batch size |
fetch.max.wait.ms | 5min | 500ms | desired cadence |
max.partition.fetch.bytes | 1801mb | 1mb | unbalanced partitions |
request.timeout.ms | 5min+1sec | 30sec | fetch.max.wait.ms + 1sec |
max.poll.records | 10000 | 500 | 1500 found too low |
max.poll.interval.ms | 5min+1sec | 5min | fetch.max.wait.ms + 1sec |
Nevertheless, I produce ~2gb of data to the topic, and I see the consumer-listener (a Batch Listener) is called many times per second -- way more than desired rate.
I logged the serialized-size of the ConsumerRecords<?,?>
argument, and found that it is never more than 55mb.
This hints that I was not able to set fetch.max.bytes above the default 50mb.
Any idea how I can troubleshoot this?
Edit: I found this question: Kafka MSK - a configuration of high fetch.max.wait.ms and fetch.min.bytes is behaving unexpectedly
Is it really impossible as stated?