Kubernetes scheduling ignores pod count per worker node

Question

We have a kubernetes cluster with three worker nodes, which was built manually, borrowing from the 'Kubernetes, the hard way' Tutorial.

Everything on this cluster works as expected for one exception: The scheduler does not - or seems not to - honor the 110 pod per worker node limit.

Example:

Worker Node 1: 60 pods Worker Node 2: 100 pods Worker Node 3: 110 pods

When I want to deploy a new pod, it often happens that the scheduler decides it would be best to schedule the new pod to 'Worker Node 3'. Kubelet refuses to do so, it does honor its 110 pod limitation. The scheduler tries again and again and never succeeds.

I do not understand why this is happening. I think I might be missing some detail about this problem.

From my understanding and what I have read about the scheduler itself, there is no resource or metric for 'amount of pods per node' which is considered while scheduling - or at least I haven't found anything that would suggest otherwise in the Kubernetes Scheduler documentation. Of course the scheduler considers CPU requests/limits, memory requests/limits, disk requests/limits - that's all fine and working. So I don't even know how the scheduler could ever consider the amount of pods used on a worker, but there has to be some kind of functionality doing that, right? Or am I mistaken?

Is my cluster broken? Is there some misconception I have about how scheduling should/does work?

Kubernetes binary versions: v1.17.2

Edit: Kubernetes version

Don't you define in scheduler to schedule each new pod to worker node 3 ? — O.Man, Dec 29 '20 at 11:06
Nothing like this ? spec: policy: name: scheduler-policy defaultNodeSelector: type=user-node,region=east — O.Man, Dec 29 '20 at 11:29
Is it possible to share more details about your environment? What Kubeadm version are you using? It's on your local env or cloud env? Can you share your YAML manifest? — PjoterS, Dec 30 '20 at 07:35
@PjoterS we did not use kubeadm to setup the cluster. we set up the cluster manually by using the kubernetes binaries - currently it is v1.17.4. What yaml manifest do you want to see? the one of a minimal nginx pod? there is no additional settings in the yaml that could bind the pod to a specific worker node if that is what you are looking for? — geruetzel, Dec 30 '20 at 11:05
@geruetzel I honestly think that it should be some problem with configuration. Kube-scheduler generate number where it try to schedule a new created pod. It is possible that algorithm generate 2 or 3 times the same random number but no x-times. — O.Man, Dec 30 '20 at 11:11
Just to clarify, you are using nodes with the same resources and pods with almost the same requests/limits and no taints, policies and affinities. You have mention that kubelet refuse to schedule on the specific worker. Can you provide logs? Also would be possible to change `Kubectl verbosity`? — PjoterS, Jan 04 '21 at 15:01

score 0 · Answer 1 · answered Dec 30 '20 at 14:01

0

Usually this means the other nodes are unsuitable. Either explicitly via taints, etc or more often things like resource request space.

answered Dec 30 '20 at 14:01

coderanger

52,400
4
52
75

but the other nodes are suitable. enough ram, enough cpu, enough disk. there is no reason that the scheduler would put the pod on this worker as I see it – geruetzel Dec 30 '20 at 19:48

Kubernetes scheduling ignores pod count per worker node

1 Answers1