We've been getting intermittent connectivity/dns issues in our Kubernetes cluster running 1.10 on Ubuntu.
We've been all over the bug reports/etc., and nearest we can figure a process is holding onto /run/xtables.lock
and it's causing issues in a kube-proxy pod.
One of the kube-proxy pods bound to a worker has this error repeating in the logs:
E0920 13:39:42.758280 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 13:46:46.193919 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:05:45.185720 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:11:52.455183 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:38:36.213967 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
E0920 14:44:43.442933 1 proxier.go:647] Failed to ensure that filter chain KUBE-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
These errors started happening about 3 weeks ago and we've been unable thus far to remedy it. Because the problems were intermittent we didn't track it down to this until now.
We think this is causing one of the kube-flannel-ds pods to be in a perpetual CrashLoopBackOff
state as well:
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-6z6rs 1/1 Running 0 40d
coredns-78fcdf6894-dddqd 1/1 Running 0 40d
etcd-k8smaster1 1/1 Running 0 40d
kube-apiserver-k8smaster1 1/1 Running 0 40d
kube-controller-manager-k8smaster1 1/1 Running 0 40d
kube-flannel-ds-amd64-sh5gc 1/1 Running 0 40d
kube-flannel-ds-amd64-szkxt 0/1 CrashLoopBackOff 7077 40d
kube-proxy-6pmhs 1/1 Running 0 40d
kube-proxy-d7d8g 1/1 Running 0 40d
kube-scheduler-k8smaster1 1/1 Running 0 40d
Most bug reports around /run/xtables.lock
seem to show it was resolved July 2017, but we're seeing this on a new setup. We seem to have appropriate chain configurating in iptables.
running fuser /run/xtables.lock
returns nothing.
Does anybody have insight on this? it's causing a lot of pain