1

We have a cronjob monitoring in our cluster. If a pod did not appear in 24 hours, It means that the cronjob haven't ran and we need to alert. But sometimes, due to some garbage collection, pod is deleted (but job completed successfully). How to keep all pods and avoid garbage collection? I know about finalizers, but looks like It's not working in this case.

Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
  • What kubernetes cluster is used? Managed in cloud by provider? – moonkotte Oct 20 '21 at 09:32
  • It's a EKS Cluster – Andrew Striletskyi Oct 21 '21 at 08:35
  • Is [cluster-autoscaler](https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html) enabled? Does your cluster scale nodes? Or someone else manually can do it? – moonkotte Oct 21 '21 at 08:39
  • cluster-autoscaler enabled. It's not someone who deleted :) Sometimes randomly pod disappeared, but job completed ( – Andrew Striletskyi Oct 22 '21 at 11:47
  • 1
    Idea is not someone deleted the `pod`, but someone scaled down a node with that `pod`. I tested this on GKE cluster: two pods were scheduled on two different nodes, then I scaled down one node. Predictably one pod disappeared, however job is still in place. So it's better change the logic how to check it. e.g. to look into `job completion` - `$ kubectl get job hello-27248440 -o jsonpath='{.status.succeeded'}` and get `1` in case it was successful. – moonkotte Oct 22 '21 at 12:49
  • Also check how [gc works](https://kubernetes.io/docs/concepts/architecture/garbage-collection/) and [TTL controller](https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/). Pods won't be deleted if their jobs are in place. There are no miracles. Check events about scaling nodes. – moonkotte Oct 22 '21 at 12:53

1 Answers1

0

Posting this as answer since it's a reason why it can happen.

Answer

Cloud kubernetes clusters have nodes autoscaling policies. Or sometimes node pools can be scaled down/up manually.

Cronjob creates job for each run which in turn creates a corresponding pod. Pods are assigned to exact nodes. And if for any reason node with assigned to it pod(s) was removed due to node autoscaling/manual scaling, pods will be gone. However jobs will be preserved since they are stored in etcd.

There are two flags which control amount of jobs stored in the history:

  • .spec.successfulJobsHistoryLimit - which is by default set to 3
  • .spec.failedJobsHistoryLimit - set by default to 1

If setting up 0 then everything will be removed right after jobs finish.

Jobs History Limits

How it happens in fact

I have a GCP GKE cluster with two nodes:

$ kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
gke-cluster-xxxx       Ready    <none>   15h     v1.21.3-gke.2001
gke-cluster-yyyy       Ready    <none>   3d20h   v1.21.3-gke.2001

cronjob.yaml for testing:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: test-cronjob
spec:
  schedule: "*/2 * * * *"
  successfulJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: test
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Pods created:

$ kubectl get pods -o wide
NAME                          READY   STATUS      RESTARTS   AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
test-cronjob-27253914-mxnzg   0/1     Completed   0          8m59s   10.24.0.22   gke-cluster-4-xxxx   <none>           <none>
test-cronjob-27253916-88cjn   0/1     Completed   0          6m59s   10.24.0.25   gke-cluster-4-xxxx   <none>           <none>
test-cronjob-27253918-hdcg9   0/1     Completed   0          4m59s   10.24.0.29   gke-cluster-4-xxxx   <none>           <none>
test-cronjob-27253920-shnnp   0/1     Completed   0          2m59s   10.24.1.15   gke-cluster-4-yyyy   <none>           <none>
test-cronjob-27253922-cw5gp   0/1     Completed   0          59s     10.24.1.18   gke-cluster-4-yyyy   <none>           <none>

Scaling down one node:

$ kubectl get nodes
NAME                 STATUS                        ROLES    AGE   VERSION
gke-cluster-4-xxxx   NotReady,SchedulingDisabled   <none>   16h   v1.21.3-gke.2001
gke-cluster-4-yyyy   Ready                         <none>   3d21h   v1.21.3-gke.2001

And getting pods now:

$ kubectl get pods -o wide
NAME                          READY   STATUS      RESTARTS   AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
test-cronjob-27253920-shnnp   0/1     Completed   0          7m47s   10.24.1.15   gke-cluster-4-yyyy   <none>           <none>
test-cronjob-27253922-cw5gp   0/1     Completed   0          5m47s   10.24.1.18   gke-cluster-4-yyyy   <none>           <none>

Previously completed pods on the first node are gone now.

Jobs are still in place:

$ kubectl get jobs
NAME                    COMPLETIONS   DURATION   AGE
test-cronjob-27253914   1/1           1s         13m
test-cronjob-27253916   1/1           2s         11m
test-cronjob-27253918   1/1           1s         9m55s
test-cronjob-27253920   1/1           34s        7m55s
test-cronjob-27253922   1/1           2s         5m55s

How it can be solved

Changing monitoring alert to look for jobs completion is much more precise method and independent to any cluster nodes scaling actions.

E.g. I still can retrieve a result from job test-cronjob-27253916 where corresponding pod to it is deleted:

$ kubectl get job test-cronjob-27253916 -o jsonpath='{.status.succeeded'}
1

Useful links:

moonkotte
  • 3,661
  • 2
  • 10
  • 25