1

Something wrong happend with my RPi 4 cluster based on k3sup.

Everything works as expected until yesterday when I had to reinstall master node operating system. For example, I have a redis installed on master node and then some pods on worker nodes. My pods can not connect to redis via DNS: redis-master.database.svc.cluster.local (but they do day before).

It throws an error that can not resolve domain when I test with busybox like:

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup redis-master.database.svc.cluster.local

When I want to ping my service with IP (also on busybox):

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- ping 10.43.115.159

It shows that 100% packet loss.

I'm able to resolve issue with DNS by simply replace coredns config (replace line with forward . /etc/resolv.conf to forward . 192.168.1.101) but I don't think that's good solution, as earlier I didn't have to do that.

Also, it solves issue for mapping domain to IP, but still connection via IP doesn't work.

My nodes:

NAME     STATUS   ROLES    AGE   VERSION         INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION   CONTAINER-RUNTIME
node-4   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.105   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-3   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.104   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-1   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.102   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1
node-0   Ready    master   10h   v1.19.15+k3s2   192.168.1.101   <none>        Debian GNU/Linux 10 (buster)   5.10.63-v8+      containerd://1.4.11-k3s1
node-2   Ready    <none>   10h   v1.19.15+k3s2   192.168.1.103   <none>        Debian GNU/Linux 10 (buster)   5.10.60-v8+      containerd://1.4.11-k3s1

Master node has a taint: role=master:NoSchedule.

Any ideas?

UPDATE 1

I'm able to connect into redis pod. /etc/resolv.conf from redis-master-0

search database.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5

All services on kubernetes:

NAMESPACE       NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                               PORT(S)                      AGE
default         kubernetes              ClusterIP      10.43.0.1       <none>                                                    443/TCP                      6d9h
kube-system     traefik-prometheus      ClusterIP      10.43.94.137    <none>                                                    9100/TCP                     6d8h
registry        proxy-docker-registry   ClusterIP      10.43.16.139    <none>                                                    5000/TCP                     6d8h
kube-system     kube-dns                ClusterIP      10.43.0.10      <none>                                                    53/UDP,53/TCP,9153/TCP       6d9h
kube-system     metrics-server          ClusterIP      10.43.101.30    <none>                                                    443/TCP                      6d9h
database        redis-headless          ClusterIP      None            <none>                                                    6379/TCP                     5d19h
database        redis-master            ClusterIP      10.43.115.159   <none>                                                    6379/TCP                     5d19h
kube-system     traefik                 LoadBalancer   10.43.221.89    192.168.1.102,192.168.1.103,192.168.1.104,192.168.1.105   80:30446/TCP,443:32443/TCP   6d8h
psalkowski
  • 414
  • 1
  • 8
  • 20
  • Very difficult to say. Did you re-create the cluster after manipulations with OS? Also it may be `coreDNS` issue - try going through [DNS troubleshooting steps](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/) – moonkotte Oct 06 '21 at 16:57
  • @moonkotte I go through the DNS troubleshooting and no luck. core-dns has correct configuration. resolv.conf, services, endpoints, config maps looks identical like in example, but nslookup can not find anything. The only diff is that I get an error in core-dns: `[INFO] 127.0.0.1:48012 - 6684 "HINFO IN 3620455167514711704.6584998321407267960. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.055304378s` when doing `nslookup redis-master.database.svc.cluster.local` – psalkowski Oct 07 '21 at 17:23
  • Can you `nslookup google.com`? – moonkotte Oct 07 '21 at 19:52
  • @moonkotte no, I can't – psalkowski Oct 08 '21 at 05:42
  • Based on log entity you shared there's only one line which is not ERROR, it's just an information message that is not related to `redis`. Can you `kubectl exec` in `redis` pod? Update you question with information in `/etc/resolv.conf` in the pod. + `kubectl get svc -A` - which IP address coredns service have? – moonkotte Oct 08 '21 at 11:35
  • @moonkotte I have updated my question with /etc/resolv.conf and list of services. One thing that I find is that traefik load balance traffic within 192.168.1.102 - 192.168.1.105. It does not have IP of master node 192.168.1.101 but not sure if it should have or not. – psalkowski Oct 12 '21 at 05:45

1 Answers1

1

There was one more thing that was not mentioned. I'm using OpenVPN with NordVPN server list on master node, and use a privoxy for worker nodes.

When you install and run OpenVPN before running kubernetes master, OpenVPN add rules that block kubernetes networking. So, coredns does not work and you can't reach any pod via IP as well.

I'm using RPi 4 cluster, so for me it was good enough to just re-install master node, install kubernetes at first and then configure openvpn. Now everything is working as expected.

It's good enough to order your system units by adding After or Before in service definition. I have VPN systemd service that looks like below:

[Unit]
Description=Enable VPN for System
After=network.target
After=k3s.service

[Service]
Type=simple
ExecStart=/etc/openvpn/start-nordvpn-server.sh

[Install]
WantedBy=multi-user.target

It guarantee that VPN will be run after kubernetes.

psalkowski
  • 414
  • 1
  • 8
  • 20