What is the possible reason that the taint and tolerations not work as I expect in EKS

Question

I am working with EKS 1.24 version, and created 2 node groups in EKS: groupA and groupB. GroupB is with taint "dedicated:druid:NoSchedule", but the pods without tolerations "dedicated Equal druid NoSchedule" are also scheduled to groupB, what is the possible reason?

My expectation is only the pods with toleration "dedicated Equal druid NoSchedule" are scheduled to groupB

I also had this problem when creating new nodegroups with taints in eks 1.23. Some daemonset pods will automatically deploy to this new nodegroup, even those pods have incorrect tolerations. — Mars, Apr 12 '23 at 09:36
But after I restart those pod, those pod will not deploy on incorrect node — Mars, Apr 12 '23 at 10:13

score 0 · Answer 1 · answered Apr 27 '23 at 10:43

I had the same problem again in production, but after I restarted all pods several times, all pods were restored to the correct worker nodes.

Then I noticed something weird, every time I found pods on incorrect worker nodes, they were created very close together.

So I guess that if pods and worker nodes start at the same time, before eks has not marked the taint on the worker node, the pod maybe put into the worker node with the mismatching taint.

I tried some things to solve this problem and it works in my environment:

Set the nodeSelector or nodeAffinity on pod, then pod will check node whether have the match label before placed into the work node
Change the effect to NoExecute in taint and toleration (if the pod does not match the label, it will be evicted to other worker nodes)

Hope those informations help you resolve your issue.

What is the possible reason that the taint and tolerations not work as I expect in EKS

1 Answers1