Pod not scheduling on a specific node due to unsatisfied podAffinity

Question

I have a really weird problem that I can't figure out. I am trying to set up a self-managed MinIO tenant, with 3 pods with 3 drives each, one for each node in my 3 node K3S setup in a embedded etcd HA configuration :

NAME        STATUS   ROLES                       AGE    VERSION
k3s-node1   Ready    control-plane,etcd,master   121d   v1.25.3+k3s1
k3s-node2   Ready    control-plane,etcd,master   121d   v1.25.3+k3s1
k3s-node3   Ready    control-plane,etcd,master   121d   v1.25.3+k3s1

None of the nodes have any taints :

[root@k3s-node1 minIO]# kubectl describe nodes | grep Taints
Taints:             <none>
Taints:             <none>
Taints:             <none>

Creating the tenant is done through the MinIO operator. I tested deploying it dozens of times on my set up while testing helm charts and it worked fine, but suddenly since a day ago one pod keeps being in "pending" state when trying to deploy on k3s-node2 specifically, despite the node showing as ready status above. All the other pods for the other two nodes deploy successfully.

When printing the yaml of the pod (which is identical to the other working pods other than hostname), the affinity looks like this :

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: v1.min.io/tenant
            operator: In
            values:
            - <my tenant name>
          - key: v1.min.io/pool
            operator: In
            values:
            - pool-1
        topologyKey: kubernetes.io/hostname

k3s-node2 has plenty of CPU, RAM, and disk space. The pod that fails to deploy to k3s-node2 shows this status on pod describe :

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-03-20T05:08:11Z"
    message: '0/3 nodes are available: 3 node(s) didn''t satisfy existing pods anti-affinity
      rules. preemption: 0/3 nodes are available: 3 No preemption victims found for
      incoming pod.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

I've always deployed via namespaces and when testing I've always fully deleted the entire namespace before attempting to redeploy. So I don't think there should be any lingering resources. I've also run

kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name,NAMESPACE:.metadata.namespace --all-namespaces

and see no duplicate pods.

What am I doing wrong here? I've changed zero configuration since it last deployed perfectly but for some reason the topologyKey doesn't seem to recognize k3s-node2, despite the node showing as READY.

EDIT :

Just in case my issue was node roles, I uninstalled K3S on node 2 and node 3 and re-installed (making sure it's installed as agent not server), then added worker label :

NAME        STATUS   ROLES                       AGE     VERSION
k3s-node1   Ready    control-plane,etcd,master   122d    v1.25.3+k3s1
k3s-node2   Ready    worker                      20m15s  v1.25.7+k3s1
k3s-node3   Ready    worker                      12m10s  v1.25.7+k3s1

Still doesn't work.

Edit 2 :

I deleted all namespaces, drained node 2 and node 3, deleted those nodes from the cluster, uninstalled K3S on ALL nodes, to do a full restart. The pods deploy now... I have no idea what was causing this. It's a good thing K3S is so easy to set up.

All nodes are marked as control plane and master nodes. so in this case pods wont get scheduled to node. Following link has details https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-nodes-by-condition. you can add worker nodes to the above k8s cluster and pods will get schedule post that — Nataraj Medayhal, Mar 20 '23 at 05:56
@Nataraj Medayhal I followed this guide to set up the k3s : https://docs.k3s.io/datastore/ha-embedded and didn't have issues with pod affinity before. Why has it changed now? Also Node 1 and Node 3 are both marked as master and control plane but they deploy just fine. — John Kim, Mar 20 '23 at 06:02
In shared linked the instructions are to create a cluster in HA mode which comprises of multiple control plane and master. Once you create a master on one node you need to execute following command on other nodes to join the cluster curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh - following link has instructions https://docs.k3s.io/quick-start instead of running command with "server" flag in all machines which will create only master nodes — Nataraj Medayhal, Mar 20 '23 at 06:31
@Nataraj Medayahal I did exactly that. For K3S HA embedded the flags show as master and control plane despite installing as agent. The documentation screenshot shows that also. Just in case I just uninstalled K3S on node 2 and reinstalled with the command format you posted. Still shows the same flags, still the same issue. — John Kim, Mar 20 '23 at 06:40
You can run /usr/local/bin/k3s-uninstall.sh which will uninstall the cluster in nodes. Kindly run it in all the master nodes. Post that following single master node cluster steps in https://docs.k3s.io/quick-start. To install master/control plane run curl -sfL https://get.k3s.io | sh - which will bring up master node and on the other two nodes. run curl -sfL https://get.k3s.io | K3S_URL=https://myserver:6443 K3S_TOKEN=mynodetoken sh -. once you follow above steps it should bring up k8s cluster with single master node and two worker nodes. — Nataraj Medayhal, Mar 20 '23 at 06:53
@NatarajMedayhal check my edit. I seriously doubt the node labels are the issue. I had the pods up and running with all three nodes as master before. There's no reason for it to have worked before and suddenly not work now. — John Kim, Mar 20 '23 at 06:56
Now i could see one of the nodes as worker nodes. This schedule pods. As per minIO document earlier node antiaffinity would had prevented the schedule of the pod "The Operator by default uses pod anti-affinity, such that the Kubernetes cluster must have at least one worker node per MinIO server pod. Use the pod placement pane to modify the pod scheduling settings for the Tenant." https://min.io/docs/minio/kubernetes/upstream/operations/install-deploy-manage/deploy-minio-tenant.html. trust you are using same minio — Nataraj Medayhal, Mar 20 '23 at 07:54
@NatarajMedayhal I reinstalled all nodes including master node and it works. When I reinistalled Node 2 and 3 with workers it still wouldn't deploy to node 2... I can't say for sure if node labels were the issue, but the issue is now resolved. Thanks for the help — John Kim, Mar 20 '23 at 08:18
@John Kim, Is your issue resolved? If yes, can you provide the resolution steps you have followed and provide it as an answer for the greater visibility of the community. — Veera Nagireddy, Mar 20 '23 at 13:09
@VeeraNagireddy It's underneath my Edit 2. I essentially reinstalled the entire cluster from scratch, so in reality i didn't really "fix" it. I don't know what the cause was. — John Kim, Mar 20 '23 at 13:29

Pod not scheduling on a specific node due to unsatisfied podAffinity

0 Answers0