Liveness probe on microservice under stress test gets pod killed

Question

My pod is failing stress tests. My pod is serving calls asynchronously when Ocp checks for liveness under overload timeout is triggered. Kubernetes values for liveness probe are as follows:

  successThreshold: 1
  failureThreshold: 3
  timeoutSeconds: 10

This makes things even worse for other pods in the replica set and they are all doomed to fail for the same reason.

I am using Quarkus with Mutiny/Vertx for asynch calls, but I think the problem here is generic. How can I give liveness calls priority?

Thanks

score 0 · Answer 1 · answered May 20 '22 at 09:39

Liveness probes are pretty straightforward, kubelet periodically executes it and if it fails, it considers that the pod has died.

It seems to me that the stress test is overloading the pod, the network or some other resource enough to have it not reply in time to the liveness probes.

Giving liveness tests priority is something you need to handle in your application but this will only work if the problem is indeed due to how the application queues requests. If it just doesn't have CPU available at all or the network is completely saturated, that is not something you can fix in your application. In that case, you'll need to figure out what exactly is saturated and making the probes timeout. Probably looking at infrastructure and application metrics will give you a hint of what is at 100% capacity.

Thanks I know the theory. I am finally trying to move the health check on another port (non application port) and with Quarkus SmallRye Health it does not seam feasible changing a simple property — Sergio, May 20 '22 at 15:16

Liveness probe on microservice under stress test gets pod killed

1 Answers1