0

We have Spring Stream Listener that operates in BATCH mode.The processing time for each batch is about 3 ms.Following is our configuration:

allow.auto.create.topics = true
auto.commit.interval.ms = 100
auto.offset.reset = earliest
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
fetch.max.bytes = 5242880
fetch.max.wait.ms = 300000
fetch.min.bytes = 2097152
heartbeat.interval.ms = 3000
isolation.level = read_uncommitted
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 330000
retry.backoff.ms = 100

We see the following behaviour

May 23, 2020 @ 23:43:48.572 Consumed 463 messages  
May 23, 2020 @ 23:43:47.791 Consumed 500 messages  
--- 5 mins Gap ---                                 
May 23, 2020 @ 23:38:47.764 Consumed 17 messages   
May 23, 2020 @ 23:38:47.386 Consumed 500 messages  
May 23, 2020 @ 23:38:46.989 Consumed 500 messages  
May 23, 2020 @ 23:38:46.540 Consumed 500 messages  
--- 5 mins Gap ---                                 
May 23, 2020 @ 23:33:46.514 Consumed 106 messages  
May 23, 2020 @ 23:33:46.155 Consumed 500 messages  
May 23, 2020 @ 23:33:45.785 Consumed 500 messages  
May 23, 2020 @ 23:33:45.358 Consumed 500 messages  

We see a 5 mins gap before we get next set of messages as seen above; this pattern continues.

We have huge amount of messages in the Kafka partition waiting to be processed. So there is no dearth of ready messages.

Not sure why we get 5 mins silence repeatedly - we have max poll wait/ poll interval all set to 5 mins. This should be okay since the max poll records of 500 can be satisfied immediately on poll.

Fetch min bytes is 2 MB and max bytes is 5 MB; which again can be satisfied by the number of messages we have.

Please let me know what am I missing.

Srikanth
  • 1,015
  • 12
  • 16
  • `fetch.max.wait.ms = 300000` `fetch.min.bytes = 2097152` - implies the max.wait is the problem since it's exactly 5 minutes - have you tried reducing it to see if the behavior changes (I know you said you think you have enough data to satisfy it, but reducing it might help your investigation). – Gary Russell May 24 '20 at 16:04

1 Answers1

0

I figured out that i had to set max.partition.fetch.bytes instead of fetch.max.bytes to 5MB. That made the wait go away.

Srikanth
  • 1,015
  • 12
  • 16