How to ensure every kubernetes node running 2 or more pods?

Question

I know that DaemonSets ensure only one pod running in every node. I know that ReplicaSets ensure the number of pod replicas defined are running but doesn't ensure every node gets a pod.

My requirement is every Node should be occupied by one POD and also I should be able to increase pod replica count! Is there any way we can achieve this?

If we can deploy application 2 times with different names i.e 1st time with Daemonsets and next with Replicasets! But is there any better approach? So that deployment can have a single manifest file with single name.

FYI, his am trying to achieve in Google Cloud- GKE.

Do you have a specific reason for wanting two pods on a node; if you add a new node in the cluster, do you want two more pods to be automatically started? Or are you just trying to ensure the hardware you have will get used? — David Maze, May 17 '20 at 17:31
For high availability. In GKE I want my pod to run in all availability zones! I can implement DaemonSets but will end up having only one pod per node which can be less for the load so I need 2 or more PODs per Node. I just found there is something called "Descheduler" but that kills running pod to achieve my above scenario for cluster balancing . — sudhir tataraju, May 17 '20 at 18:21

score 2 · Answer 1 · answered Jul 02 '20 at 22:37

My requirement is every Node should be occupied by one POD and also I should be able to increase pod replica count! Is there any way we can achieve this?

No, at the moment kubernetes doesn't provide a mechanism which would enable you to achive exactly what you want.

Having read carefully your question I would summarize your key requirements as follows:

Pods should be scheduled on every node (as Daemonset does).
At the same time you need to be able to schedule desired number of Pods of a certain kind on all nodes. And of course the number of such Pods will be much bigger than the number of nodes. So you need to be able to schedule more than one Pod of a certain type on each node.
When one of the nodes becomes temporarily unavailable, missing Pods should be scheduled to the remaing nodes to be able to handle same workload.
When node becomes available again, Pods that were moved before to other nodes should be rescheduled on the newly recovered node.

If you need to have more than just a one Pod on every node Daemonset definitely is not a solution you look for as it ensures that exactly one copy of a Pod of a certain kind is running on every node. A few different Daemonsets doesn't seem a good solutions either as Pods would be managed separately in such scenario.

I would also like to refer to @redzack's answer. Taking into consideration all the above requirements, podAntiAffinity doesn't solve this problem at all. Let's suppose you have only those 3 nodes. If you increase your replicas number e.g. to 6 you'll see something like below:

NAME                           READY     STATUS    RESTARTS   AGE       IP           NODE
web-server-1287567482-5d4dz    1/1       Running   0          7m        10.192.2.3   kube-node-1
web-server-1287567482-6f7v5    1/1       Running   0          7m        10.192.4.3   kube-node-3
web-server-1287567482-s330j    1/1       Running   0          7m        10.192.3.2   kube-node-2
web-server-1287567482-5ahfa    1/1       Pending   0          7m        <none>       <none>
web-server-1287567482-ah47s    1/1       Pending   0          7m        <none>       <none>
web-server-1287567482-ajgh7    1/1       Pending   0          7m        <none>       <none>

Due to podAntiAffinity new Pods won't be eligible to be scheduled on those nodes, on which one Pod of this kind is already running. Even if you change the type of podAntiAffinity from requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution it won't meet your requirement as you may end up with any scenario like: 3 pods on node1, 2 pods on node2 and 1 pod on node3 or even only 2 nodes may be used. So in such case it won't work any better than a normal deployment without any affinity/anti-affinity rules.

Furthermore it won't cover point 4 from the above requirements list at all. Once missing node is recovered, nothing will re-schedule to it those Pods that are already running on different nodes. The only solution that can guarantee that when new node appeares/re-appeares, Pod of a certain kind is scheduled on such node, is Daemonset. But it won't cover point 2 and 3. So there is no ideal solution for your use case.

If someone has some better ideas how it cannot be achieved, feel free to join this thread and post your own answer but in my opinion such sulution is simply unavailable at the moment, at least not with the standard kube-scheduler.

If a single copy of a Pod, running on each node is not enough to handle your workload, I would say: simply use standard Deployment with desired numbers of replicas and rely on kube-scheduler to decide on which node it will be scheduled and you can be pretty sure that in most cases it will do it properly and distribute your workload evenly. Well, it won't re-destribute already running Pods on new/recovered node so it's not perfect but I would say for most scenarios it should work very well.

score 0 · Answer 2 · answered May 17 '20 at 20:25

0

First of all, a daemonset can also be configured and restricted not to spawn pods on all the nodes.

Yes, you can achieve a pod to spawn on every node by a deployment object by pod affinity with topologykey set as "kubernetes.io/hostname".

With the above example, you will have the following behaviour:

I hope thats what you are looking for:

answered May 17 '20 at 20:25

redzack

1,521
8
20

in above approach can we have replicas as well? if 2 or more replicas each node can have then ok! even even node3 goes down and comesup node3 should have a pod. – sudhir tataraju May 18 '20 at 05:24
Wait what do you mean by replicas? the fact that above example clearly shows 3 replicas on each node. Are you talking about replicasets? Deployment already uses replicasets and RS is just a subset of deployment. Yes it's obvious you can have multiple replicas. Yes node3 will always have one pod. Also I'm not sure there will be more replicas than the number of nodes as you have defined affinity for the pod. You can always try the example and scale the replicas to 5 or 6 and check how that works out for you. Draining node3 could also be done just to be sure pod3 comes up. – redzack May 18 '20 at 17:07
I hope the answer is clear and acceptable. Please let me know the thoughts and implementation part. Hopefully it should work for you. Thanks – redzack May 19 '20 at 20:32

score 0 · Answer 3 · answered May 01 '22 at 09:16

I'm on the hunt for this specific thing as well. We have a NodeJs app (therefore single-threaded) and we'd like to say something like "keep 5 copies running on each host", but DaemonSet only allows 1.

I am still searching so I'll come back to update this answer (uless the internet tumbleweeds catch me), but I think the way to do it is with Topology Spread Constraints . https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

It might end up being no different to just running multiple DaemonSets though

How to ensure every kubernetes node running 2 or more pods?

3 Answers3