2

I have a Kafka consumer running on a Spring application.

I am trying to config the consumer with fetch.max.wait.ms and fetch.min.bytes.

I would like the consumer to wait until there are 15000000 bytes of messages or 1 minute has passed.

consumerProps.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 60000);
consumerProps.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 15000000);
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(consumerProps));

I know this configuration does have an effect, because once it was set i started to get org.apache.kafka.common.errors.DisconnectException

To resolve it i increased request.timeout.ms

consumerProps.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, 120000);

This resolved the errors, but the behavior is not as expected:

The consumer is picking up messages (at low amount, no way near the fetch.min.bytes) very often.

However, within a minute it will sometimes do multiple fetches.

It works O.k on my local dev when i test it with Spring EmbeddedKafka, but doesn't work on production. (MSk)

What can explain it? Is it possible it doesn't work well on MSK?

Are there other properties that play a role here or can be in the way?

Is it correct to say that, assuming i am always under fetch.min.bytes, that i won't see more than 1 fetch per minute?

Is there a case where while records are polled, new ones are written, what is the expected behavior then? does it affect current poll or next one?

(other properties defined for this consumer: session.timeout.ms, max.poll.records, max.partition.fetch.bytes)

====== EDIT =====

After some investigation i discovered something: The configuration works as expected when the consumer is working against a topic with a single partition.

When working against a topic with multiple partitions the fetch time becomes unexpected.

mosh
  • 404
  • 2
  • 8
  • 16
  • I am battling a similar issue. What instance type and number of brokers do you have configured? What settings sre in you cluster configuration? – Dude0001 Jul 03 '22 at 03:53
  • @mosh - Like you have written there are other properties as well that determine consumer behaviour. Can you be specific what do you mean when you say `it doesn't work well on MSK?` What is the scenario that you are testing - what is the expected behaviour and what is the observed behaviour? – Rishabh Sharma Jul 12 '22 at 12:11
  • Hi, thanks for the reply. The scenario - I have a topic which is constantly have messages written into it. The expected behavior - the consumer should read at most once a minute. (i'm way below the fetch.min.bytes limit). Actual - consumer read at a much faster rate, can read every second. One important thing i noticed - if the topic has a single partition it works well, if the topic has more than one then the behavior is unexpected @RishabhSharma – mosh Jul 13 '22 at 12:39

1 Answers1

0

I have not used the spring consumer myself but after doing some research it seems it is not possible to achieve what you are trying to do. As per this thread, it is not possible to configure poll duration in the listener implementation.

However, you can write your own poll logic and achieve the desired behaviour using poll duration and max poll records. You can use this code as reference and configure:

  • Poll duration as 60 seconds
  • max.poll.records
Rishabh Sharma
  • 747
  • 5
  • 9
  • Can you please be a bit more specific? I want a behavior of - wait 60 seconds if there isn't "enough" data accumulated in the topic yet. I don't see how you get it with the above references. – mosh Jul 14 '22 at 06:19
  • `final ConsumerRecords consumerRecords = consumer.poll(1000);` This poll is a blocking call, you will not get messages back till poll duration happends. So you can provide 60 seconds as poll duration and this way it would be ensured that you will get accumulated message batch post 60 seconds. Poll could also return due to max poll records. See [this](https://stackoverflow.com/questions/72938880/kafka-conumer-poll-argument/72943199#72943199) – Rishabh Sharma Jul 14 '22 at 06:47
  • But the parameter for poll is the timeout, if there is data it will return at once, won't block. "timeout - The time, in milliseconds, spent waiting in poll if data is not available in the buffer. If 0, returns immediately with any records that are available currently in the buffer, else returns empty. Must not be negative." – mosh Jul 14 '22 at 12:35
  • Spring **does** support setting the `pollTimeout` (on `ContainerProperties` with a default of 5 seconds) but, as the OP states, this has no influence on how the fetch max wait and min bytes behave, it just sets an upper limit. – Gary Russell Jul 14 '22 at 13:09
  • @mosh No your understanding is incorrect. Say poll timeout=60 seconds and max.poll.records=100K. Now if there are new 10K records available in Kafka the poll call will NOT return immediately. Re-read the thread regarding the explanation of the two parameters from my previous comment. – Rishabh Sharma Jul 14 '22 at 14:40
  • @GaryRussell if you have the solution on how to configure the poll timeout in spring consumer could you please add the details on how? I was unable to find it and it would be good to know. – Rishabh Sharma Jul 15 '22 at 04:19
  • When using spring-boot: https://docs.spring.io/spring-boot/docs/current/reference/html/application-properties.html#application-properties.integration.spring.kafka.listener.poll-timeout otherwise https://docs.spring.io/spring-kafka/docs/current/reference/html/#pollTimeout – Gary Russell Jul 15 '22 at 13:22
  • @RishabhSharma - i tried to configure pollTimeout + max.poll.records=100K and messages are read as fast as they come. I don't think the timeout property is waiting for records to accumulate to the value of max.poll.records – mosh Jul 31 '22 at 07:19
  • @mosh if you are still using spring boot consumer, have a look at Gary Rusell's comment. I am afraid I won't be able to help you much with that. However in case you have written your own poll logic as I wrote in my answer, please share the code(including configuration). I will gladly take a look. – Rishabh Sharma Jul 31 '22 at 16:02
  • @mosh I am facing same issue using spring boot and confluent kafka. Were you able to achieve the wait until there are 15000000 bytes of messages or 1 minute has passed ? – Nitin Jan 19 '23 at 23:25
  • 1
    @Nitin it dosen't work if you have more than one partition – mosh Jan 26 '23 at 20:28