1

Situation: I've got a CronJob that often fails (this is expected at the moment). Due to the fact that the container performing the job, has a side-car, the dependencies are between the containers are expressed through bash scripts and common mounts of emptyDir in /etc/liveness folder:

        spec:
          containers:
          - args:
            - -c
            - set -x;
              ...
              ./process; # execute the main process
              rc=$?;
              rm /etc/liveness; # clean-up
              exit $rc;
            command:
            - /bin/bash

Problem: In the scenarios, where the job fails, I see the following in the logs:

+ rc=255
+ rm /etc/liveness
+ exit 255

With retryPolicy set to never, the failed pod enters the Completed status, which is misleading:

scheduler-1594015200-wl9xc   0/2     Completed     0          24m
Bernard Halas
  • 972
  • 11
  • 24

2 Answers2

2

A Pod's status field is a PodStatus object, which has a phase field.

Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

Status and Phase is not the same. So I learned, that what happens above is that my pods end up in status Completed and phase Failed.

Bernard Halas
  • 972
  • 11
  • 24
1

According to official doc,

A Job creates one or more Pods and ensures that a specified number of them successfully terminate.

And containers enter terminated state when

it has successfully completed execution or when it has failed for some reason.

So if you set retryPolicy to never, this is what will happen.

Ken Chen
  • 691
  • 3
  • 16
  • 1
    I'm sorry, but I didn't get your answer. So why is the pod status `Completed` and not `Failed`? – Bernard Halas Jul 06 '20 at 07:55
  • 1
    The Pod is Failed. The job is Completed. – coderanger Jul 06 '20 at 08:46
  • Hi @coderanger as you can see in my question in the output of the `kubectl get pods` command, the **pod** is not `Failed`, it's `Completed`. Besides, from the docs: Description: Failed. All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system. Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase – Bernard Halas Jul 06 '20 at 09:16