1
 cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

Clean install for kubernetes using kubeadm init following the steps directly in the docs. Tried with flannel, weavenet and Calico.

After about 5-10 minutes, after a watch kubectl get nodes, I'll get these messages at random, leaving me with an inaccessible cluster that I can't apply any .yml files to.

Unable to connect to the server: net/http: TLS handshake timeout

The connection to the server 66.70.180.162:6443 was refused - did you specify the right host or port?

Unable to connect to the server: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""

kubelet is fine aside from it showing it can't get random services from 66.70.180.162 (the master node)

[root@play ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2018-12-10 13:57:17 EST; 21min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 3411939 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─3411939 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-...

Dec 10 14:18:53 play kubelet[3411939]: E1210 14:18:53.811213 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/c...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.011239 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://66.70.180.162:6443/api/v1/...nection refused
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.211160 3411939 reflector.go:134] object-"kube-system"/"kube-proxy-token-n5qjm": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.411190 3411939 reflector.go:134] object-"kube-system"/"coredns-token-7qjzv": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube-sy...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.611103 3411939 reflector.go:134] object-"kube-system"/"coredns": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/conf...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.811105 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://66.70.180.162:6443/api/v1/nod...nection refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.011204 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://66.70.180.162:6443/api...nection refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.211132 3411939 reflector.go:134] object-"kube-system"/"weave-net-token-5zb86": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube-...
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.411281 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/c...
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.611125 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://66.70.180.162:6443/api/v1/...nection refused
Hint: Some lines were ellipsized, use -l to show in full.

A docker container that runs coredns shows issues with getting resources from the what looks like anything in k8s default Service Subnet CIDR range (showing a VPS on a separate hosting provider using a local IP here)

.:53
2018-12-10T10:34:52.589Z [INFO] CoreDNS-1.2.6
2018-12-10T10:34:52.589Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769

...

E1210 10:55:53.286644       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1210 10:55:53.290019       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused

Kubeapi is just showing random failures and it looks like its mixing in IPv6.

I1210 19:23:09.067462       1 trace.go:76] Trace[1029933921]: "Get /api/v1/nodes/play" (started: 2018-12-10 19:23:00.256692931 +0000 UTC m=+188.530973072) (total time: 8.810746081s):
Trace[1029933921]: [8.810746081s] [8.810715241s] END
E1210 19:23:09.068687       1 available_controller.go:316] v2beta1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.069678       1 available_controller.go:316] v1. failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1./status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.073019       1 available_controller.go:316] v1beta1.apiextensions.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apiextensions.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.074112       1 available_controller.go:316] v1beta1.batch failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.batch/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.075151       1 available_controller.go:316] v2beta2.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta2.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.077408       1 available_controller.go:316] v1.authorization.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authorization.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.078457       1 available_controller.go:316] v1.networking.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.networking.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.079449       1 available_controller.go:316] v1beta1.coordination.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.coordination.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.080558       1 available_controller.go:316] v1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.081628       1 available_controller.go:316] v1beta1.scheduling.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.scheduling.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.082803       1 available_controller.go:316] v1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.083845       1 available_controller.go:316] v1beta1.events.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.events.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.084882       1 available_controller.go:316] v1beta1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.085985       1 available_controller.go:316] v1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.087019       1 available_controller.go:316] v1beta1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.088113       1 available_controller.go:316] v1beta1.certificates.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.certificates.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.089164       1 available_controller.go:316] v1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.090268       1 available_controller.go:316] v1beta1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
W1210 19:23:28.996746       1 controller.go:181] StopReconciling() timed out

And I'm out of troubleshooting steps.

sfxworks
  • 1,031
  • 8
  • 27
  • what environment are you running (bare-metal, cloud vm's?) could you also post results of `ps -aux | grep kube` followed by `systemctl restart kube-apiserver`if required (is it working for more than 10 minutes?) – aurelius Dec 11 '18 at 17:01
  • One's a vm in the cloud and the other is bare metal. Debian 9 on that same cloud env worked. Pulled docker directly from their repo following this guide. https://docs.docker.com/install/linux/docker-ce/debian/ – sfxworks Dec 12 '18 at 00:42
  • Can you check health status for the core components on master node: ` kubectl get pods -n kube-system`? – Nick_Kh Dec 27 '18 at 09:39
  • No k8s api call worked. It failed at random. kubectl get pods fell under that category. – sfxworks Jan 02 '19 at 07:55
  • You need to provide more details regarding the installations, also check the iptables settings on bought cloud and bare metal as there might be some restrictions. – Crou Jan 02 '19 at 12:05
  • This was a fresh install from OVH on a dedicated server and an OpenStack VM using the Centos7 image. It may work on other hosting providers as both environments did not have any known limitations. If you're having trouble replicating this please let me know what information you would like me to provide. – sfxworks Jan 03 '19 at 00:12
  • Can you share logs on how often does the drop occur? Also can you test the connection between those servers, for example a ping to test if the network is not the issue here. – Crou Jan 04 '19 at 09:14

0 Answers0