I have a really weird problem that I can't figure out. I am trying to set up a self-managed MinIO tenant, with 3 pods with 3 drives each, one for each node in my 3 node K3S setup in a embedded etcd HA configuration :
NAME STATUS ROLES AGE VERSION
k3s-node1 Ready control-plane,etcd,master 121d v1.25.3+k3s1
k3s-node2 Ready control-plane,etcd,master 121d v1.25.3+k3s1
k3s-node3 Ready control-plane,etcd,master 121d v1.25.3+k3s1
None of the nodes have any taints :
[root@k3s-node1 minIO]# kubectl describe nodes | grep Taints
Taints: <none>
Taints: <none>
Taints: <none>
Creating the tenant is done through the MinIO operator. I tested deploying it dozens of times on my set up while testing helm charts and it worked fine, but suddenly since a day ago one pod keeps being in "pending" state when trying to deploy on k3s-node2
specifically, despite the node showing as ready status above. All the other pods for the other two nodes deploy successfully.
When printing the yaml of the pod (which is identical to the other working pods other than hostname), the affinity looks like this :
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: v1.min.io/tenant
operator: In
values:
- <my tenant name>
- key: v1.min.io/pool
operator: In
values:
- pool-1
topologyKey: kubernetes.io/hostname
k3s-node2 has plenty of CPU, RAM, and disk space. The pod that fails to deploy to k3s-node2 shows this status on pod describe :
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-20T05:08:11Z"
message: '0/3 nodes are available: 3 node(s) didn''t satisfy existing pods anti-affinity
rules. preemption: 0/3 nodes are available: 3 No preemption victims found for
incoming pod.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
I've always deployed via namespaces and when testing I've always fully deleted the entire namespace before attempting to redeploy. So I don't think there should be any lingering resources. I've also run
kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name,NAMESPACE:.metadata.namespace --all-namespaces
and see no duplicate pods.
What am I doing wrong here? I've changed zero configuration since it last deployed perfectly but for some reason the topologyKey doesn't seem to recognize k3s-node2, despite the node showing as READY.
EDIT :
Just in case my issue was node roles, I uninstalled K3S on node 2 and node 3 and re-installed (making sure it's installed as agent not server), then added worker label :
NAME STATUS ROLES AGE VERSION
k3s-node1 Ready control-plane,etcd,master 122d v1.25.3+k3s1
k3s-node2 Ready worker 20m15s v1.25.7+k3s1
k3s-node3 Ready worker 12m10s v1.25.7+k3s1
Still doesn't work.
Edit 2 :
I deleted all namespaces, drained node 2 and node 3, deleted those nodes from the cluster, uninstalled K3S on ALL nodes, to do a full restart. The pods deploy now... I have no idea what was causing this. It's a good thing K3S is so easy to set up.