Kubernetes Service unavailable when container crashes

Question

In my Kubernetes cluster, I have a single pod (i.e. one replica) with two containers: server and cache.

I also have a Kubernetes Service that matches my pod.

If cache is crashing, when I try to send an HTTP request to server via my Service, I get a "503 Service Temporarily Unavailable".

The HTTP request is going into the cluster via Nginx Ingress, and I suspect that the problem is that when cache is crashing, Kubernetes removes my one pod from the Service load balancers, as promised in the Kubernetes documentation:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

I don't prefer this behavior, since I still want to be able server to respond to requests even if cache has failed. Is there any way to get this desired behavior?

It would be a lot to share the full deployment -- I created a toy example for the purposes of sharing. — Matt S, Jul 29 '21 at 17:57

score 1 · Answer 1 · answered Jul 29 '21 at 17:26

1

A POD is brought to the "Failed" state if one of the following conditions occur

One of its containers exit with non-zero status
Kubernates terminates a container due to health checker failing

So, if you need one of your containers to still respond when another one fails,

Make sure your liveliness probe is pointed to the container you need to be continuing. The health checker will get success code always and will not mark the POD as "Failed"
Make sure the readiness probe is pointed to the container you neesd to be continuing. This will make sure that the load balancer will still send the traffic to your pod.
Make sure that you handle the container errors gracefully and make them exit with zero status code.

In the following example readiness and liveliness probes, make sure that the port 8080 is handled by the service container and it has the /healthz and /ready routes active.

    readinessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
    livenessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      timeoutSeconds: 1

answered Jul 29 '21 at 17:26

Charlie

22,886
11
59
90

If the pod fails because one of its containers exit with non-zero status, can we use liveness / readiness probes to get around that? Or will the probes not matter in this case? – Matt S Jul 29 '21 at 17:58
Try the `restartPolicy` of the POD to set to `Never`. This applies to all the containers though. – Charlie Jul 29 '21 at 18:01
The desired behavior is that even if `cache` exists with a non-zero status, that `server` can still be present in the list of available backends in `Service`. If we use `restartPolicy: Never`, won't that mean we'll just lose the pod if `cache` exits? – Matt S Jul 29 '21 at 18:17
Kubernates will not try to restart the containers for those settings. So, I believe it is going to ignore the non-zero status. give it a try. – Charlie Jul 29 '21 at 18:20
1

Yep, this works. Thanks! Wondering if there is a way to get the desired behavior without `restartPolicy: Never`. – Matt S Jul 29 '21 at 18:37

score 0 · Accepted Answer · answered Aug 03 '21 at 14:28

0

The behavior I am looking for is configurable on the Service itself via the publishNotReadyAddresses option:

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#servicespec-v1-core

answered Aug 03 '21 at 14:28

Matt S

1
2

Kubernetes Service unavailable when container crashes

2 Answers2