1

I have k8s cluster with two node, master and worker node, installed Calico.

I initialized cluster and installed calico with following commands

# Initialize cluster
kubeadm init --apiserver-advertise-address=<MatserNodePublicIP> --pod-network-cidr=192.168.0.0/16

# Install Calico. Refer to official document
# https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-50-nodes-or-less
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml

After that, I found pods running in different node can't communicate with each other, but pods running in same node can communicate with each other.

Here are my operations:

# With following command, I ran a nginx pod scheduled to worker node
# and assigned pod id 192.168.199.72
kubectl create nginx --image=nginx

# With following command, I ran a busybox pod scheduled to master node
# and assigned pod id 192.168.119.197
kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox sh

# In busybox bash, I executed following command
# '>' represents command output
wget 192.168.199.72 
> Connecting to 192.168.199.72 (192.168.199.72:80)
> wget: can't connect to remote host (192.168.199.72): Connection timed out

However, if nginx pod run in master node (same as busybox), the wget would output a correct welcome html.

(For scheduling nginx pod to master node, I cordon worker node, and restarted nginx pod)

I also tried to schedule nginx and busybox pod to worker node, the wget ouput is a correct welcome html.


Here are my cluster status, everything looks fine. I searched all I can find but couldn't find solution.

matser and worker node can ping each other with private IP.

For firewall

systemctl status firewalld
> Unit firewalld.service could not be found.

For node infomation

kubectl get nodes -o wide

NAME                     STATUS                     ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
pro-con-scrapydmanager   Ready                      control-plane,master   26h   v1.21.2   10.120.0.5    <none>        CentOS Linux 7 (Core)   3.10.0-957.27.2.el7.x86_64   docker://20.10.5
pro-con-scraypd-01       Ready,SchedulingDisabled   <none>    

For pod infomation

kubectl get pods -o wide --all-namespaces

NAMESPACE      NAME                                             READY   STATUS    RESTARTS   AGE   IP                NODE                     NOMINATED NODE   READINESS GATES
default        busybox                                          0/1     Error     0          24h   192.168.199.72    pro-con-scrapydmanager   <none>           <none>
default        nginx                                            1/1     Running   1          26h   192.168.119.197   pro-con-scraypd-01       <none>           <none>
kube-system    calico-kube-controllers-78d6f96c7b-msrdr         1/1     Running   1          26h   192.168.199.77    pro-con-scrapydmanager   <none>           <none>
kube-system    calico-node-gjhwh                                1/1     Running   1          26h   10.120.0.2        pro-con-scraypd-01       <none>           <none>
kube-system    calico-node-x8d7g                                1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    coredns-558bd4d5db-mllm5                         1/1     Running   1          26h   192.168.199.78    pro-con-scrapydmanager   <none>           <none>
kube-system    coredns-558bd4d5db-whfnn                         1/1     Running   1          26h   192.168.199.75    pro-con-scrapydmanager   <none>           <none>
kube-system    etcd-pro-con-scrapydmanager                      1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-apiserver-pro-con-scrapydmanager            1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-controller-manager-pro-con-scrapydmanager   1/1     Running   2          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-proxy-84cxb                                 1/1     Running   2          26h   10.120.0.2        pro-con-scraypd-01       <none>           <none>
kube-system    kube-proxy-nj2tq                                 1/1     Running   2          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
kube-system    kube-scheduler-pro-con-scrapydmanager            1/1     Running   1          26h   10.120.0.5        pro-con-scrapydmanager   <none>           <none>
lens-metrics   kube-state-metrics-78596b555-zxdst               1/1     Running   1          26h   192.168.199.76    pro-con-scrapydmanager   <none>           <none>
lens-metrics   node-exporter-ggwtc                              1/1     Running   1          26h   192.168.199.73    pro-con-scrapydmanager   <none>           <none>
lens-metrics   node-exporter-sbz6t                              1/1     Running   1          26h   192.168.119.196   pro-con-scraypd-01       <none>           <none>
lens-metrics   prometheus-0                                     1/1     Running   1          26h   192.168.199.74    pro-con-scrapydmanager   <none>           <none>

For services

kubectl get services -o wide --all-namespaces

NAMESPACE      NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default        kubernetes           ClusterIP   10.96.0.1       <none>        443/TCP                  26h   <none>
default        nginx                ClusterIP   10.99.117.158   <none>        80/TCP                   24h   run=nginx
kube-system    kube-dns             ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   26h   k8s-app=kube-dns
lens-metrics   kube-state-metrics   ClusterIP   10.104.32.63    <none>        8080/TCP                 26h   name=kube-state-metrics
lens-metrics   node-exporter        ClusterIP   None            <none>        80/TCP                   26h   name=node-exporter,phase=prod
lens-metrics   prometheus           ClusterIP   10.111.86.164   <none>        80/TCP                   26h   name=prometheus
  • can you paste the calico yaml link you have applied for cni in the question and also check if firewalld is running ? – confused genius Jul 19 '21 at 06:18
  • can you specify what exactly you mean by "cannot communicate with eachother". what have you tried? what is the actual behavior compared to the expected behavior? – meaningqo Jul 19 '21 at 07:25

1 Answers1

1

Ok. It's fault of firewall. I opened all of the following ports on my master node and recreated my cluster, then everything got fine and cni0 interface appeared. Although I still don't know why.

During the proccessing of tring, I find cni0 interface is important. If there is no cni0, I could not ping pod running in diffrent node.

(Refer: https://docs.projectcalico.org/getting-started/bare-metal/requirements)

Configuration   Host(s) Connection type Port/protocol
Calico networking (BGP) All Bidirectional   TCP 179
Calico networking with IP-in-IP enabled (default)   All Bidirectional   IP-in-IP, often represented by its protocol number 4
Calico networking with VXLAN enabled    All Bidirectional   UDP 4789
Calico networking with Typha enabled    Typha agent hosts   Incoming    TCP 5473 (default)
flannel networking (VXLAN)  All Bidirectional   UDP 4789
All kube-apiserver host Incoming    Often TCP 443 or 6443*
etcd datastore  etcd hosts  Incoming    Officially TCP 2379 but can vary