Pods on a k8s node are unaccessible, kube-proxy or CNI failed

Question

I have add a new node to my k8s cluster, but I found some allocated to this node cannot show logs like this:

$ kubectl logs -n xxxx xxxxx-6d5bdd7d6f-5ps6k

Unable to connect to the server: EOF

Using Lens gives error logs like this:

Failed to load logs: request to http://127.0.0.1:49271/api-kube/api/v1/namespaces/xxxxxxx/pods/xxxx34-27736483--1-hxjpv/log?tailLines=500&timestamps=true&container=xxxxxx&previous=false failed, reason: socket hang up
Reason: undefined (ECONNRESET)

I believe there's some problem in this node, when I use port-forwarding:

$ kubectl port-forward -n argocd svc/argocd-notifications-controller-metrics 9001:9001
error: error upgrading connection: error dialing backend: dial tcp 10.0.6.20:10250: i/o timeout

I think the internal IP 10.0.6.20 is wrong.

All kube-proxy pods shows running from kubectl

-> % kgp -o wide -n kube-system | grep kube-proxy
kube-proxy-7pg9d                                  1/1     Running     1 (2d20h ago)   29d     10.0.6.20        worker4     
kube-proxy-cqh2c                                  1/1     Running     1 (15d ago)     29d     10.0.6.3         worker3           
kube-proxy-lp4cd                                  1/1     Running     0               29d     10.0.6.1         worker1           
kube-proxy-r6bgw                                  1/1     Running     0               29d     10.0.6.2         worker2

But using crictl pods on each node looking for these pods

# crictl pods | grep kube-proxy
ceef94b060e56       2 days ago          Ready               kube-proxy-7pg9d                                   kube-system         1                   (default)
418bd5b46c2b9       4 weeks ago         NotReady            kube-proxy-7pg9d                                   kube-system         0                   (default)

Shows Ready or NotReady I am using Calico for CNI, in ip_vs mode. How can I fix this?

There's nothing wrong with those NotReady pods, as long as you have another one that is Ready, started afterwards. What makes you think `10.0.6.20` is wrong, what's your node IP? Any chance your DNS would resolve "worker4" to wrong IP? Regardless, connection timing out to port 10250 (kubelet) indeed suggests there's no one listening on that IP. I would ssh to that node, check ip configuration,, routes, compare with a working node. — SYN, Oct 09 '22 at 08:10
from crictl perspective, a pod that is NotReady may just be some leftover from a previous kubernetes pod. If you run `crictl ps`, you would see running containers, which would belong to ready pods. while `crictl ps -a` would show you exited containers, usually belonging to unready pods. usually, unready crictl pods could be removed, manually or using scripts, when kubelet leaves those behind. — SYN, Oct 09 '22 at 08:16
@SYN, because I do port-forwarding from my computer, which should not in the subnet where `10.0.6.20` is reachable, the node ip should be an actual IP of worker4 — Andy Huang, Oct 09 '22 at 08:24
"I do port-forwarding" : please explain what do you mean here. "which should not be in the subnet where 10.0.6.20 is reachable" ; makes no sense, you realize your other workers are in the same /26? "should be an actual IP of worker4" : again, what's that node IP, then? Have you checked ip & routing configuration on that node? What about dns resolution on the other node (specifically: the api) — SYN, Oct 09 '22 at 08:27
Keeping in mind the "IP" address showing on your pods is usually set by the kubelet instance running on your node, based on actual values at the time of starting the pod. Should we guess that you're using DHCP assigning addresses to your nodes, and the lease for worker4 was somehow renewed? Feel free to add details to your original post — SYN, Oct 09 '22 at 08:31
@SYN I use port-forwearding from kubectl from my local computer `kubectl port-forward -n argocd svc/argocd-notifications-controller-metrics 9001:9001` shows this error: `error: error upgrading connection: error dialing backend: dial tcp 10.0.6.20:10250: i/o timeout` 10.0.6.20 is impossible to reach from my local computer. I am not sure how to check routing, configurations, DNS among nodes. — Andy Huang, Oct 10 '22 at 03:01
We would use kubectl port-forward connecting SDN addresses/services. Here, we're talking about the IP of your node (unrelated to your SDN). You should have some kind of access over there. — SYN, Oct 10 '22 at 06:59
... so what is your worker4 node IP? if 10.0.6.20: check kubelet logs, as we could see there's no answer from here. Since timeout (and not refused), I suspect you have some DHCP, node changed addresses, and somehow API doesn't yet know about it. — SYN, Oct 10 '22 at 11:45
@SYN worker4 IP is `213.108.105.12`, the kubelet service is run by systemd, I used `journalctl -xeu kubelet` to find logs, but mostly logs are about specific container status, what specific key I should note? — Andy Huang, Oct 10 '22 at 13:10
I found iptables on worker4 is dropping all packets from kubeapiserver — Andy Huang, Oct 10 '22 at 13:54
any clue why?! lacking anything better to suggest: would a reboot help, maybe? — SYN, Oct 10 '22 at 15:19

score 1 · Accepted Answer · answered Oct 10 '22 at 16:04

I solved this problem following these procedure:

Worker4

Make sure kubelet is listening on default port:

# lsof -i:10250
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet 819 root   26u  IPv4  13966      0t0  TCP worker4.cluster.local:10250 (LISTEN)

Worker1 curl https://10.0.6.20:10250 Gets timeout But found curl https://10.0.6.1:10250 # worker1 from worker4 responded quick.

So this might be packets is dropped inside worker4,

This logs packet in worker4: https://www.thegeekstuff.com/2012/08/iptables-log-packets/

iptables -N LOGGING
iptables -A INPUT -j LOGGING
iptables -A LOGGING -m limit --limit 2/min -j LOG --log-prefix "IPTables-Dropped: " --log-level 4
iptables -A LOGGING -j DROP

Will save logs to /var/log/syslog

Using command to filter from logs:

tail -200 /var/log/syslog | grep IPTables-Dropped | grep 10.0.6.1
Oct 10 13:49:37 compute kernel: [637626.880648] IPTables-Dropped: IN=eth1 OUT= MAC=00:16:ce:d4:b7:01:00:16:b2:77:89:01:08:00 SRC=10.0.6.1 DST=10.0.6.20 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29087 DF PROTO=TCP SPT=58838 DPT=10250 WINDOW=64240 RES=0x00 SYN URGP=0

So I am convinced the packet is dropped.

Adding rules:

iptables -I INPUT -s 10.0.0.0/8 -p tcp --dport 10250 -j ACCEPT

Then I can attach shell or get logs from pods on the node. I appreciate discussions with @SYN

it's weird that port would not be reachable, it is indeed critical for kubernetes operations. kube-proxy and sdn components may setup rules on your nodes, but it should not interfere with this. If rebooting did not help, it could be worth investigating: where did that come from, how come you don't have it on your other nodes, ... still, nice catch. pretty weird. — SYN, Oct 12 '22 at 16:53

Pods on a k8s node are unaccessible, kube-proxy or CNI failed

1 Answers1