My requirement is every Node should be occupied by one POD and also I should be able to increase pod replica count! Is there any way we can achieve this?
No, at the moment kubernetes doesn't provide a mechanism which would enable you to achive exactly what you want.
Having read carefully your question I would summarize your key requirements as follows:
Pods
should be scheduled on every node (as Daemonset
does).
- At the same time you need to be able to schedule desired number of
Pods
of a certain kind on all nodes. And of course the number of such Pods
will be much bigger than the number of nodes. So you need to be able to schedule more than one Pod
of a certain type on each node.
- When one of the nodes becomes temporarily unavailable, missing
Pods
should be scheduled to the remaing nodes to be able to handle same workload.
- When node becomes available again,
Pods
that were moved before to other nodes should be rescheduled on the newly recovered node.
If you need to have more than just a one Pod
on every node Daemonset
definitely is not a solution you look for as it ensures that exactly one copy of a Pod
of a certain kind is running on every node. A few different Daemonsets
doesn't seem a good solutions either as Pods
would be managed separately in such scenario.
I would also like to refer to @redzack's answer. Taking into consideration all the above requirements, podAntiAffinity
doesn't solve this problem at all. Let's suppose you have only those 3 nodes. If you increase your replicas
number e.g. to 6 you'll see something like below:
NAME READY STATUS RESTARTS AGE IP NODE
web-server-1287567482-5d4dz 1/1 Running 0 7m 10.192.2.3 kube-node-1
web-server-1287567482-6f7v5 1/1 Running 0 7m 10.192.4.3 kube-node-3
web-server-1287567482-s330j 1/1 Running 0 7m 10.192.3.2 kube-node-2
web-server-1287567482-5ahfa 1/1 Pending 0 7m <none> <none>
web-server-1287567482-ah47s 1/1 Pending 0 7m <none> <none>
web-server-1287567482-ajgh7 1/1 Pending 0 7m <none> <none>
Due to podAntiAffinity
new Pods
won't be eligible to be scheduled on those nodes
, on which one Pod
of this kind is already running. Even if you change the type of podAntiAffinity
from requiredDuringSchedulingIgnoredDuringExecution
to preferredDuringSchedulingIgnoredDuringExecution
it won't meet your requirement as you may end up with any scenario like: 3 pods on node1, 2 pods on node2 and 1 pod on node3 or even only 2 nodes may be used. So in such case it won't work any better than a normal deployment without any affinity/anti-affinity rules.
Furthermore it won't cover point 4 from the above requirements list at all. Once missing node is recovered, nothing will re-schedule to it those Pods
that are already running on different nodes. The only solution that can guarantee that when new node appeares/re-appeares, Pod
of a certain kind is scheduled on such node, is Daemonset
. But it won't cover point 2 and 3. So there is no ideal solution for your use case.
If someone has some better ideas how it cannot be achieved, feel free to join this thread and post your own answer but in my opinion such sulution is simply unavailable at the moment, at least not with the standard kube-scheduler.
If a single copy of a Pod
, running on each node is not enough to handle your workload, I would say: simply use standard Deployment
with desired numbers of replicas and rely on kube-scheduler to decide on which node it will be scheduled and you can be pretty sure that in most cases it will do it properly and distribute your workload evenly. Well, it won't re-destribute already running Pods
on new/recovered node so it's not perfect but I would say for most scenarios it should work very well.