0

I have a k8s cluster setup using kubespray.

Last week one of my k8s nodes have very low storage, so all the pods has been evicted, include some important pods like calico-node, kube-proxy (I thought that these pods are critical and never been evicted no matter what)

After that all the calico-node pods become not ready, when I check the log, it is said that: Warning: Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.xxx, where 192.168.0.xxx is the IP of above problematic node.

My question is how can I restore that node? is it safe to just run the kubespray's cluster.yml again?

My k8s version is v1.13.3

Thanks.

Hiep Ho
  • 314
  • 2
  • 7
  • Is kubelet running and posting status to Kube api-server? – shashank tyagi Feb 19 '20 at 09:00
  • kubelet is running, but cannot connect to api-server since kube-proxy on that node is not running – Hiep Ho Feb 19 '20 at 09:01
  • Kube-proxy is a static pod. It doesn't run as deployment/daemonset etc. It's directly managed by kubelet. So check the manifests folder(default /etc/kubernetes/manifests) to assure that kube-proxy-XXX.yaml is present. Anyway kubelet is the primary node-agent and communicates directly with the api-server. It doesn't need kube-proxy to post status to Api-server. Can you post the kubelet logs ? – shashank tyagi Feb 19 '20 at 09:05
  • Thanks, I will check that again – Hiep Ho Feb 19 '20 at 09:08
  • kube-proxy is not a static pod. It's a daemonset which is installed by kubeadm during init phase, like coredns. You can read the details using the command: `kubeadm init phase addon kube-proxy --help` BTW, kubespray also use kubeadm to initialize Kubernetes cluster, so it works in exaclty the same way. https://github.com/kubernetes-sigs/kubespray/blob/a901b1f0d7777cac7bbf51b84cfb2962e5642341/roles/kubernetes/master/templates/kubeadm-config.v1beta2.yaml.j2#L296 – VAS Feb 25 '20 at 11:53

1 Answers1

0

When node has a disk pressure its status changes to NotReady and a taint is added to the node: Taints: node.kubernetes.io/disk-pressure:NoSchedule.

All pods running on this node are getting evicted, except api-server, kube-controller and kube-scheduler- eviction manager will save those pods from getting evicted with error message: cannot evict a critical static pod [...]

Once the node is freed from disk pressure it will change its status to Ready and previously added taint will be removed. You can check it by running kubectl describe node <node_name>. In the conditions field you should see that DiskPressure has changed status to False which means that node has enough space available. Similar information can be also found in Events field.

  Normal   NodeReady                1s                     kubelet, node1     Node node1 status is now: NodeReady
  Normal   NodeHasNoDiskPressure    1s (x2 over 1s)        kubelet, node1     Node node1 status is now: NodeHasNoDiskPressure

After confirming that the node is ready with sufficient disk space you can restart kubelet and run kubespray's cluster.yml- the pods will be redeployed on the node. You just have to make sure that node is ready to handle deployments.

kool
  • 3,214
  • 1
  • 10
  • 26