behavior of kafka consumer poll method with one time call

Question

Let's say I call poll() just once with some timeout, instead of while(true) {...poll...}

Will the consumer get all the records from last committed to the latest available in one shot?
Does the timeout parameter matters? E.g. if timeout==0 and there are millions of records, what happens?

According to my experiment, the poll behavior is kind of undefined. (1) Using while(true) {...poll...}, the num of records per poll seems random to me (2) I didn't find a relationship between timeout and number of records per poll either. But definitely it doesn't guarantee getting all available records at one shot Please if someone can explain — hawkssss, Apr 01 '19 at 21:37
This post has probably the best explanation https://stackoverflow.com/questions/51753883/increase-the-number-of-messages-read-by-a-kafka-consumer-in-a-single-poll — hawkssss, Apr 02 '19 at 15:26

score 1 · Answer 1 · edited Apr 01 '19 at 21:35

1

According to the kafka-doc, by default max.poll.records is 500

The maximum number of records returned in a single call to poll().

Therefore, every time you call poll() you can get upto 500 max.poll.records=500 from last committed offset

edited Apr 01 '19 at 21:35

Ryuzaki L

answered Apr 01 '19 at 21:31

It also says "By default, there is essentially no limit." https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html – hawkssss Apr 01 '19 at 21:34
1

@hawkssss No limit means that you can set this configuration up to any `int` but it defaults to `500`. I'll update my answer to include this. Hope this helps. – Giorgos Myrianthous Apr 01 '19 at 21:36
well I just experimented it with default, sometimes it gets more than 500 records in one poll... – hawkssss Apr 01 '19 at 21:42

1 Answers1