cluster-autoscaler deployment fails with "1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector"

Question

I have created a k8s cluster with kops (1.21.4) on AWS and as per the docs on autoscaler. I have done the required changes to my cluster but when the cluster starts, the cluster-autoscaler pod is unable to schedule on any node. When I describe the pod, I see the following:

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  4m31s (x92 over 98m)  default-scheduler  0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.

Looking at the deployment for cluster I see the following podAntiAffinity:

      affinity:                                                                 
        podAntiAffinity:                                                        
          preferredDuringSchedulingIgnoredDuringExecution:                      
          - podAffinityTerm:                                                    
              labelSelector:                                                    
                matchExpressions:                                               
                - key: app                                                      
                  operator: In                                                  
                  values:                                                       
                  - cluster-autoscaler                                          
              topologyKey: topology.kubernetes.io/zone                          
            weight: 100                                                         
          requiredDuringSchedulingIgnoredDuringExecution:                       
          - labelSelector:                                                      
              matchExpressions:                                                 
              - key: app                                                        
                operator: In                                                    
                values:                                                         
                - cluster-autoscaler                                            
            topologyKey: kubernetes.com/hostname

From this I understand that it want to prevent running pod on same node which already has cluster-autoscaler running. But that doesn't seem to justify the error seen in the pod status.

Edit: The pod for autoscaler has the following nodeSelectors and tolerations:

Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

So clearly, it should be able to schedule on master node too.

I am not sure what else do I need to do to make the pod up and running.

Check [this answer](https://stackoverflow.com/questions/64965832/aws-eks-only-2-pod-can-be-launched-too-many-pods-error). It can be related to the error you're having. You should understand why there's a `too many pods` error because it's about master node. Other nodes do not meet affinity which is expected — moonkotte, Jan 07 '22 at 12:08
@moonkotte I am using t3.small instances and awsvpc for networking so it means I can have at max 3(network interface) *4 (ip addresses per network interface) = 12 ip addresses. But I can see in aws console that I have only network interfaces being assigned to the instance. So I have not yet exhausted the network interfaces on master node. So assigning more pods should be possible. — Divick, Jan 07 '22 at 14:19
But I can also see that there are 12 pods already running, and as per https://github.com/aws/amazon-vpc-cni-k8s/tree/master#cni-configuration-variables, it suggests to use -max-pods equal to (ENIs × (the number of IPs per ENI - 1)) + 2, which equals (3*(4-1))+2 = 11 and as per https://kops.sigs.k8s.io/networking/aws-vpc/, it seems every pod gets an ip from vpc network. Nevertheless I will try to run more masters to see if the issue is resolved. — Divick, Jan 07 '22 at 14:22
@moonkotte Thanks for pointing me in the right direction. The problem was maximum ips that one could have with awsvpc for a given instance type. I have now switched to t3.medium instance and it works like a charm. Although it works but my question in comment above still stands. — Divick, Jan 07 '22 at 18:36
@moonkotte If you could put it as an answer I can accept the answer as correct — Divick, Jan 08 '22 at 07:33

score 1 · Accepted Answer · answered Jan 10 '22 at 08:39

Posting the answer out of comments.

There are podAffinity rules in place so first thing to check is if any errors in scheduling are presented. Which is the case:

0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.

Since there are 1 control plane (on which pod is supposed to be scheduled) and 3 worked nodes, that leads to the error 1 Too many pods related to the control plane.

Since cluster is running in AWS, there's a known limitation about amount of network interfaces and private IP addresses per machine type - IP addresses per network interface per instance type.

t3.small was used which has 3 interfaces and 4 IPs per interface = 12 in total which was not enough.

Scaling up to t3.medium resolved the issue.

Credits to Jonas's answer about the root cause.

Rakesh Gupta · Answer 2 · 2022-01-07T04:12:50.823

0

You need to check the pod/deployment for nodeSelector property. Make sure that your desired nodes have this label.

Also, if you want to schedule pods on the master node, you must remove the taint first

kubectl taint nodes --all node-role.kubernetes.io/master-

edited Jan 07 '22 at 04:12

answered Jan 06 '22 at 18:15

Rakesh Gupta

3,507
3
18
24

please see my edits in my question. I can see that the nodeSelector as well as tolerations are set to node-role.kubernetes.io/master, so ideally the pod should have got scheduled on master node. – Divick Jan 07 '22 at 03:43
Master mode generally has taints. Pls remove the taint using kubectl taint nodes --all node-role.kubernetes.io/master- – Rakesh Gupta Jan 07 '22 at 04:11
If I understand correctly then tolerations are actually to allow running pods on even tainted nodes. So I don't think just removing taint is sufficient. Moreover that will also allow running normal pods to masters, when I don't want them to run on masters at all. – Divick Jan 07 '22 at 06:58
BTW I tried that too and it didn't help. – Divick Jan 07 '22 at 07:05

cluster-autoscaler deployment fails with "1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector"

2 Answers2