0

How to force a certain number of messages to be read from Kafka by a batch. The service reads a random amount instead of the specified one.

In the kafka settings I specified the options max-poll-records: "500"

spring:
   main:
     allow-bean-definition-overriding: true
   kafka:
     listener:
       type: batch
     consumer:
       enable-auto-commit: true
       auto-offset-reset: latest
       group-id: my-app
       max-poll-records: "500"
       fetch-min-size: "1000MB"
     bootstrap servers:
       "localhost:9092"

which shows how many messages should be read at one time (500 messages)

and specified the second parameter setIdleBetweenPolls("5000") in Kafka's config file:

 @Bean
 public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactoryBatch(
     ConsumerFactory<String, String> consumerFactory) {
     ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
     var properties = new HashMap<String, Object>();

     properties.putAll(consumerFactory.getConfigurationProperties());
     factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(properties));
     factory.setBatchListener(true);
     factory.getContainerProperties().setIdleBetweenPolls("5000");
     return factory;
 }

this is the reading interval = 5 seconds.

Those. every 5 seconds, the service reads 500 messages from kafka, then another 500, then another 500, and so on.

Main problem: when I send 20 messages to kafka or 50 or 100 - there is no problem. The service reads all messages at a time. But if I send 500 messages to Kafka, or for example 10 000 messages, then the service reads randomly, not necessarily 500. It can read 500 messages at a time, or maybe less (for example, 200 and 300 or 150 and 300 and 50), etc.

P.S: I dug up a lot of information on the Internet and I don’t understand how to fix this problem and whether this is even possible. Please share your opinion and a possible solution to this problem.

Thank you all in advance!

Kirill Sereda
  • 469
  • 1
  • 10
  • 25
  • It could be due to the concurrency. Your service is just _too fast_ for your use case. Try to set concurrency = 1. Hope it helps. – Mar-Z Jun 27 '23 at 13:32
  • I don't think that `max.poll.records` is only indicator for the batch size. See other relevant options: https://kafka.apache.org/documentation/#consumerconfigs. For example, `fetch.max.bytes`. See some discussion here: https://stackoverflow.com/questions/51753883/increase-the-number-of-messages-read-by-a-kafka-consumer-in-a-single-poll – Artem Bilan Jun 27 '23 at 13:59
  • I read your links and tried adding fetch-max-bytes: "102428800" and max-partition-fetch-bytes: "104857600" (as the discussion says) but it didn't help. The service still reads from Kafka not always 500 messages, but randomly :( – Kirill Sereda Jun 28 '23 at 06:26
  • Do you have any ideas? – Kirill Sereda Jul 16 '23 at 11:29

1 Answers1

0

There is no min.poll.records only a max.

You have some control over it, but not using a record count.

See fetch.min.bytes and fetch.max.wait.ms.

fetch.min.bytes is 1 by default so the poll will sometimes return fewer than the max.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179