0

Let's say I call poll() just once with some timeout, instead of while(true) {...poll...}

  1. Will the consumer get all the records from last committed to the latest available in one shot?
  2. Does the timeout parameter matters? E.g. if timeout==0 and there are millions of records, what happens?
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
hawkssss
  • 11
  • 3
  • According to my experiment, the poll behavior is kind of undefined. (1) Using while(true) {...poll...}, the num of records per poll seems random to me (2) I didn't find a relationship between timeout and number of records per poll either. But definitely it doesn't guarantee getting all available records at one shot Please if someone can explain – hawkssss Apr 01 '19 at 21:37
  • This post has probably the best explanation https://stackoverflow.com/questions/51753883/increase-the-number-of-messages-read-by-a-kafka-consumer-in-a-single-poll – hawkssss Apr 02 '19 at 15:26

1 Answers1

1

According to the kafka-doc, by default max.poll.records is 500

The maximum number of records returned in a single call to poll().

Therefore, every time you call poll() you can get upto 500 max.poll.records=500 from last committed offset

Ryuzaki L
  • 37,302
  • 12
  • 68
  • 98
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • It also says "By default, there is essentially no limit." https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html – hawkssss Apr 01 '19 at 21:34
  • 1
    @hawkssss No limit means that you can set this configuration up to any `int` but it defaults to `500`. I'll update my answer to include this. Hope this helps. – Giorgos Myrianthous Apr 01 '19 at 21:36
  • well I just experimented it with default, sometimes it gets more than 500 records in one poll... – hawkssss Apr 01 '19 at 21:42