I have a GKE cluster (1.12.10-gke.17).
I'm running the nginx-ingress-controller with type: LoadBalancer
.
I've set externalTrafficPolicy: Local
to preserve the source ip.
Everything works great, except during rolling updates. I have maxSurge: 1
and maxUnavailable: 0
.
My problem is that during a rolling update, I start getting request timeouts. I suspect the Google load balancer is still sending requests to the node where the pod is Terminating
even though the health checks are failing. This happens for about 30-60s starting right when the pod changes from Running
to Terminating
. Everything stabilizes after a while and traffic eventually goes only to the new node with the new pod.
If the load balancer is slow to stop sending requests to a terminating pod, is there some way to make these rolling deploys hitless?
My understanding is that in a normal k8s service, where externalTrafficPolicy
is not normal, the Google load balancer simply sends requests to all nodes and let's the iptables sort it out. When a pod is Terminating
the iptables are updated quickly and traffic does not get sent to that pod anymore. In the case where externalTrafficPolicy
is Local
however, if the node that receives the request does not have a Running
pod, then the request times out, which is what is happening here.
If this is correct, then I only see two options
- stop sending requests to the node with a
Terminating
pod - continue servicing requests even though the pod is
Terminating
I feel like option 1 is difficult since it requires informing the load balancer that the pod is about to start Terminating
.
I've made some progress on option 2, but so far haven't gotten it working. I've managed to continue serving requests from the pod by adding a preStop
lifecycle hook which just runs sleep 60
, but I think the problem is that the healthCheckNodePort
reports localEndpoints: 0
and I suspect something is blocking the request between arriving at the node and getting to the pod. Perhaps, the iptables aren't routing when localEndpoints: 0
.
I've also adjusted the Google load balancer health check, which is different from the readinessProbe
and livenessProbe
, to the "fastest" settings possible e.g. 1s interval, 1 failure threshold and I've verified that the load balancer backend aka k8s node, indeed fails health checks quickly, but continues to send requests to the terminating pod anyway.