My Kubernetes cluster has memory pressure limits that I need to fix (at a later time).
There are sometimes anywhere from a few evicted pods to dozens. I created a Cronjob spec for clearing up the evicted pods. I tested the command inside and it works fine from powershell.
However, it doesn't matter if I specify a namespace in the spec or not, deploy it to every namespace that exists, the script doesn't seem to delete my evicted pods.
Original Script:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-evicted-pods
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command: ["sh", "-c", "kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -"]
restartPolicy: OnFailure
I tried creating the script with associated RBAC, with no luck either.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: development
name: cronjob-runner
rules:
- apiGroups:
- extensions
- apps
resources:
- deployments
verbs:
- 'patch'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cronjob-runner
namespace: development
subjects:
- kind: ServiceAccount
name: sa-cronjob-runner
namespace: development
roleRef:
kind: Role
name: cronjob-runner
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-cronjob-runner
namespace: development
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-all-failed-pods
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-cronjob-runner
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -
restartPolicy: OnFailure
I realize I should have better memory limits defined, but this functionality was working before I upgraded k8s to 1.16 from 1.14.
Is there something I'm doing wrong or missing? If it helps, I'm running in Azure (AKS).