We are facing a strange issue in only one of our environment (with same consumer app).
Basically, it is observed that suddenly a lag starts to build up with only one of the topics on kafka broker (it has multiple topics), with 10 consumer members under a single consumer group.
Even after multiple restarts, adding another pod of consumer application, changing defaults configuration properties (max poll records, session timeout) so far have NOT helped much.
Looking for any suggestions, advice on how to possibly debug the issue (we tried enabling apache logs, cloud watch etc, but so we only saw that regular/periodic rebalancing is happening, even for very low load of 7k messages waiting for processing).
Below are env details:
- App - Spring boot app on version 2.7.2 Platform
- AWS Kafka - MSK
- Kafka Broker - 3 brokers (version 2.8.x)
- Consumer Group - 1 with 15 members (partition 8, Topic 1)