How to prevent low resource disaster

Question

When low on resources kubernetes starts to re-create pods but newer pods also fail, so they keep growing in number. The cluster becomes unusable. This seems an illogical behaviour. Is it possible to prevent it ? Is it possible to recover without deleting everything ?

light@o-node0:~/lh-orchestrator$ k get pod
NAME                            READY   STATUS                   RESTARTS        AGE
aa344-detect-5cd757f65d-8kz2n   0/1     ContainerStatusUnknown   536 (62m ago)   46h
bb756-detect-855f6bcc78-jnfzd   0/1     ContainerStatusUnknown   8 (59m ago)     75m
aa344-analyz-5cc6c59d6c-rchkm   0/1     ContainerStatusUnknown   1               46h
lh-graphql-77fc996db5-8qcxl     0/1     ContainerStatusUnknown   1 (2d ago)      2d
lh-pgadmin-5b598d4d4-shjbz      0/1     ContainerStatusUnknown   1               2d
bb756-analyz-8cd7c48f7-k2xh9    0/1     ContainerStatusUnknown   1               75m
lh-postgres-698bc448bd-9vkqp    0/1     ContainerStatusUnknown   1               2d
lh-pgadmin-5b598d4d4-c4ts4      0/1     ContainerStatusUnknown   1               54m
lh-graphql-77fc996db5-btvzx     0/1     ContainerStatusUnknown   1               54m
lh-postgres-698bc448bd-99m55    0/1     ContainerStatusUnknown   1               54m
aa344-detect-5cd757f65d-qmvcc   0/1     ContainerStatusUnknown   1               58m
bb756-detect-855f6bcc78-7lc7g   0/1     ContainerStatusUnknown   1               56m
lh-graphql-77fc996db5-7lbms     1/1     Running                  0               34m
lh-pgadmin-5b598d4d4-l6f7s      0/1     ContainerStatusUnknown   1               34m
aa344-analyz-5cc6c59d6c-78ltt   0/1     ContainerStatusUnknown   1 (17m ago)     55m
lh-postgres-698bc448bd-gjbf2    0/1     ContainerStatusUnknown   1               34m
aa344-detect-5cd757f65d-cbspd   0/1     ContainerStatusUnknown   1               33m
bb756-detect-855f6bcc78-qvqsf   0/1     ContainerStatusUnknown   1               32m
lh-pgadmin-5b598d4d4-4znww      1/1     Running                  0               17m
lh-postgres-698bc448bd-xxm28    1/1     Running                  0               16m
aa344-analyz-5cc6c59d6c-h7vfc   1/1     Running                  3 (9m41s ago)   16m
bb756-analyz-8cd7c48f7-4tdcp    1/1     Running                  7 (10m ago)     54m
bb756-detect-855f6bcc78-fgpzx   0/1     Pending                  0               2s
bb756-detect-855f6bcc78-t4p4q   0/1     ContainerStatusUnknown   1               16m
aa344-detect-5cd757f65d-cd6gl   0/1     ContainerStatusUnknown   1               16m
aa344-detect-5cd757f65d-dwhf6   0/1     Pending                  0               1s

The environments I work with in my day job have the [cluster autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) enabled, so if the cluster is low on resources it provisions more nodes. Setting that up isn't really a programming-related question, though. — David Maze, Apr 27 '23 at 09:44

Sai Chandra Gadde · Answer 1 · 2023-05-02T13:47:03.237

Before deleting the pods check why the pods are failing to create, if it is due to memory follow below steps:

If you are using docker run docker system prune -a to clean up some space taken by docker and then the node will get some space, drain the node and restart docker
Any container will write any amount of storage to the filesystem. set a quota (limits.ephemeral-storage, requests.ephemeral-storage) to limit this.
You may need to increase storage as kubernetes need more space.

For Deployment: scale down the deployment so that if pods are deleted new pods will not try to create. If you scale down Kubernetes will delete the pods.

Now scale up the deployment so kubernetes creates new replicas of the pod that the previous command.

You can also delete all the pods which are in failed phase without scaling the deployment by running below command:

kubectl delete pod --field-selector=status.phase==Failed

You can find more methods regarding deleting the pod in this blog by Oren Ninio

Generally it is recommended to use an autoscaler to manage deployments.

Regarding the indefinite growth issue. As I can see even if you have `replicas: 1` it will still grow indefinitely trying to replace an evicted pod. The only option is to scale down to `0`. Not sure if you meant this. — noname7619, Apr 28 '23 at 06:32
You can also set restart policy so the new pods will not get created. — Sai Chandra Gadde, Apr 28 '23 at 06:55

How to prevent low resource disaster

1 Answers1