I'm running Spring boot applications in k8s cluster with Kafka. during a rolling update or scaling my services, some of them rebalanced which is ok since consumers are being added or removed, but this causes the service whos rebalancing to stop serving traffic.
I'm using
- Spring boot 2.1.1.RELEASE
- Spring Integration Kafka 3.1.0.RELEASE
- Spring Kafka 2.2.7.RELEASE
I have 3 topics each with 2000 partition, the services are 30-50 depending on the system load. And using consumer groups for each topic.
First I thought that new services are signaling that they are ready (via Actuator readiness probe) which causes them to accept traffic before they are actually ready, but that's not the case since the existing ones also stop serving traffic while they rebalancing.
What's the best practices for scaling or rolling update which will trigger the minimum rebalancing possible