The problem I'm trying to solve is horizontal scaling for the web application, where some sessions lead to high CPU usage. The idea is to use Readiness probe to inform K8s that pod is loaded with the current task and new traffic has to be sent to another one (HPA will do the work and prepare a new pod).
But I want that session that processing on the initial pod will be active and once work is done the result will be delivered to the user.
The question is does it mean that if readiness probe fail K8s will:
- Stop route ALL traffic to the pod, drop current sessions that open through ingress.
- Stop route NEW traffic to the pod, but current sessions will be active during the specified timeout.
Thank you in advance.