Kubernetes - Calico-Nodes 0/1 Ready

Question

We are deploying Jenkins on the K8s env, with 1 master and 4 worker nodes using calico network plugin, the pods are created on the time of Job run in Jenkins, but the issue is hostnames don't resolve, no error logs in Jenkins, on checking the pods, calico pod on master node is down, not sure if this is cause for the above problem.

[root@kmaster-1 ~]#  kubectl get pod calico-node-lvvx4 -n kube-system -o wide
NAME                READY   STATUS    RESTARTS   AGE   IP             NODE                                  NOMINATED NODE   READINESS GATES
calico-node-lvvx4   0/1     Running   9          9d    x0.x1.x5.x6   kmaster-1.b.x.x.com   <none>           <none>



Events:
  Type     Reason     Age                       From                                          Message
  ----     ------     ----                      ----                                          -------
  Warning  Unhealthy  107s (x34333 over 3d23h)  kubelet, kmaster-1.b.x.x.com  (combined from similar events): Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 10.x1.2x.x23,10.x1.x7.x53,10.x1.1x.1x5,10.x1.2x.1x22020-04-12 08:40:48.567 [INFO][27813] health.go 156: Number of node(s) with BGP peering established = 0

10.x1.2x.x23,10.x1.x7.x53,10.x1.1x.1x5,10.x1.2x.1x2 are the IPs of the worker pods, they are connected among themselves as netstat shows BGP established, but not with the master. Port 179 is open on the master,not sure why BGP peering doesn't establish, Kindly advice.

Have you tried to went through calico troubleshooting? [There](https://docs.projectcalico.org/maintenance/troubleshooting#error-caliconode-is-not-ready-bird-is-not-ready-bgp-not-established-with-10001) is exactly the error you have. I assume you use kubeadm, have you add --pod-network-cidr as mentioned [here](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network)? — Jakub, Apr 14 '20 at 09:07
@jt97 Yes i did go through them earlier, my CoreDNS is up and running, not sure how to establish the BGP session between the master calico pod and nodes calico pods, the master node and the worker nodes are reachable. — Sanjay M. P., Apr 14 '20 at 09:19
Have you tried to specify interface as mentioned in this [stackoverflow answer](https://stackoverflow.com/a/57943570/11977760)? Maybe try to install new/older version of calico? — Jakub, Apr 16 '20 at 06:17
@jt97 Thanks for your time, yes i tried editing on run time with the mentioned changes, but it didn't accept it, so I'll re-install calico and check. — Sanjay M. P., Apr 16 '20 at 07:54

score 2 · Answer 1 · answered Apr 20 '20 at 04:05

2

Adding the below lines to the calico yaml did the magic.

Specify interface

        - name: IP_AUTODETECTION_METHOD
          value: "interface=ens."

answered Apr 20 '20 at 04:05

Sanjay M. P.

919
1
16
33

1

Its hard to tell if your specifying a specific interface or using the regex function. If using a regex" `value: "interface=ens."` needs to be `"value: "interface=ens.*"` "ens." does not work. – Dave Apr 11 '21 at 19:37

score 2 · Accepted Answer · answered Apr 11 '21 at 19:31

What Sanjay M. P. shared worked for me, however I want to clarify what caused the problem, and why the solution work with some more detail.

First of all, I am running an ubuntu env, so what Piknik shared does not work, firewalld is only on centos / rhel systems. Even still, ufw was disabled on all nodes.

I was able to narrow down the exact error I was receiving to cause this problem by doing a kubectl describe pod calico-node-*****. What I found was the calico BIRD service could not connect to peers. What also showed was the IP addresses the calico-node was trying to use to pair to for it's BGP peers. It was using the wrong interface, thereby wrong ips.

To define the problem for myself, all of my node host vms have multiple interfaces. If you don't explicitly specify which interface to use, calico "automatically" picks one, weather you want that interface or not.

The solution was to specify the specific interface when you build your calico overlay network in the calico.yaml file. Sanjay M. P. uses a regex, which MAY work if you have different named interfaces, however, as I am running Ubuntu Server, the string "ens" starts for all interfaces, so the same problem happens.

I have stripped out most of the calico.yaml file to show the exact location of where this setting should be (~line 675) Add the setting there, I also left the CALICO_IPV4POOL_CIDR as well as this setting needs to be set appropriately to the same subnet range specified on kubeadm initialization:

spec:
  template:
    spec:
      containers:
        - name: calico-node
          image: calico/node:v3.14.2
          env:
            - name: CALICO_IPV4POOL_CIDR
              value: "192.168.0.0/22"
            - name: IP_AUTODETECTION_METHOD
              value: "interface=ens224"

Unfortunately I did not find a way to roll back older configurations, so I just rebuilt the whole cluster, and redeployed the calico overlay (Thank god for VM snapshots).

kubeadm init your cluster. Then run kubectl create -f calico.yaml with the setting added to build out the overlay network.

Confirm overlay network is working

run watch -n1 kube-system get pods -o wide, and then add your nodes. Make sure all calico-nodes being build on newly added kube nodes come up as "1/1 Running".
Download and install calicoctl, and run calicoctl node status, make sure the correct network is being used for BGP.

You can read more about IP_AUTODETECTION_METHOD here.

score 1 · Answer 3 · answered Jan 07 '21 at 14:59

1

Also, in addition to Sanjay M. P. I'll also say that I had to turn off the firewall.

systemctl disable --now firewalld

Maybe you can somehow tweak it somehow, but I haven't tested it, so I won't advise

answered Jan 07 '21 at 14:59

Piknik

43
1
6

score 0 · Answer 4 · answered Sep 22 '20 at 01:53

0

Activate the network interface used by calico through firewall-cmd.

Recently, while operating a node using a virtual machine, the interface of the virtual machine was deactivated, causing the same phenomenon.

answered Sep 22 '20 at 01:53

mate'kim

1

Kubernetes - Calico-Nodes 0/1 Ready

4 Answers4

Specify interface