0

I have deployed a kubeadm based kubernetes cluster v1.24.3 which consists of one control-plane node and 3 worker nodes (all Centos 7 VMs). These are all "on premises" on a single physical node.

On this setup, I am trying to deploy a CNI network plugging but the CNI provider containers are failing on worker nodes, the error reported from the kubectl logs is 'Get "https://10.96.0.1:443/api?timeout=32s": dial tcp 10.96.0.1:443: connect: no route to host'.

The pod deployed on the control-plane node is running without errors.

I get this behaviour when I install either Calico's tigera-operator or Weave net. Weave-net deploys a DaemonSet whose pod on the control-plane node runs successfully but the pods deployed on the worker nodes fail with the error above.

For Calico's tigera-operator, a single pod is deployed on one of the worker node, this too fails with the error above.

When I ssh into the control plane node and issue the command "nc -w 2 -v 10.96.0.1 443" I get connected. When I try issuing the same command on any one of the worker nodes, the connection is not established and I get the message "Ncat: Connection timed out.".

From the worker nodes, should I manually configure routing of 10.96.0.1 to the control-plane node(s), if so how should I go about it? In my setup the control-plane node has ip 192.168.12.17 while one of the worker nodes has the IP address 192.168.12.20.

Allan K
  • 151
  • 6

1 Answers1

1

This error message means that the nodes cannot access the kubernetes API inside the cluster. This means that some traffic between the nodes and the control plane is blocked somewhere.

This is likely related to network interface configurations, kubeadm configuration (regarding ip addresses), and lastly firewall configurations.

What worked for me, was first observing that things started working when I turned off all firewalls. Then working my way back to the secure configuration step by step, until something broke. By using ufw, its logs told me what traffic was taking the public interface rather than the private network. The solution then turned out to be a missing set of parameters in the InitConfiguration:

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
...
nodeRegistration:
  kubeletExtraArgs:
    "node-ip": "<insert the controllers private ip here>"
localAPIEndpoint:
  advertiseAddress: "<insert the controllers private ip here>"
  bindPort: 6443

In an HA setup, remember to also add these lines in the JoinConfiguration -> controlPlane part for the extra controllers.

wessel
  • 111
  • 2
  • turning the firewall did the trick for me ! my issues was nginx daemon set on 2 nodes was not able to fetch api version (don't know why if I do curl on that system it did work) will investigate that further but a nice direction at least! – Ankur Feb 17 '23 at 16:09