I maintain a Kubernetes cluster. The nodes are in an intranet with 10.0.0.0/8
IPs, and the pod network range is 192.168.0.0/16
.
The problem is, some of the worker nodes have unreachable routes to pod networks on other nodes, like:
0.0.0.0 10.a.b.65 0.0.0.0 UG 0 0 0 eth0
10.a.b.64 0.0.0.0 255.255.255.192 U 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.20.0 - 255.255.255.192 ! 0 - 0 -
192.168.21.128 - 255.255.255.192 ! 0 - 0 -
192.168.22.64 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.22.66 0.0.0.0 255.255.255.255 UH 0 0 0 cali3859982c59e
192.168.24.128 - 255.255.255.192 ! 0 - 0 -
192.168.39.192 - 255.255.255.192 ! 0 - 0 -
192.168.49.192 - 255.255.255.192 ! 0 - 0 -
...
192.168.208.128 - 255.255.255.192 ! 0 - 0 -
192.168.228.128 10.14.170.104 255.255.255.192 UG 0 0 0 tunl0
When I docker exec
into the Calico container, the connections to other nodes are reported unreachable
in bird:
192.168.108.64/26 unreachable [Mesh_10_15_39_59 08:04:59 from 10.a.a.a] * (100/-) [i]
192.168.112.128/26 unreachable [Mesh_10_204_89_220 08:04:58 from 10.b.b.b] * (100/-) [i]
192.168.95.192/26 unreachable [Mesh_10_204_30_35 08:04:59 from 10.c.c.c] * (100/-) [i]
192.168.39.192/26 unreachable [Mesh_10_204_89_152 08:04:59 from 10.d.d.d] * (100/-) [i]
...
As a result, the pods on the broken nodes almost can't access anything in the cluster.
I've tried to restart a broken node, remove it from cluster, run kubeadm reset
, and re-join it. But all remained the same.
What's the possible cause, and how should I fix this? Many thanks in advance.