I've been trying to setup a Kubernetes cluster for a few months now, but I have no luck so far.
I'm trying to set it up on 4 bare metal PCs running coreOS. I've just clean installed everything again, but I get to the same problem as before. I'm following this tutorial. I think I've configured everything correctly, but am not 100% sure. When I reboot any of the machines, kubelet and flanneld services are running, but I see the following errors for them when checking service status with systemctl status
:
kubelet error: Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)
flanneld error: Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)
If I restart both services, they work, or at least look like they work - I get no errors.
Everything else seems to work fine, so the only problem (I think) left are the kube-proxy service on all nodes.
If I run kubectl get pods
I see all pods running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kube-apiserver-kubernetes-4 1/1 Running 4 6m
kube-controller-manager-kubernetes-4 1/1 Running 6 6m
kube-proxy-kubernetes-1 1/1 Running 4 18h
kube-proxy-kubernetes-2 1/1 Running 5 26m
kube-proxy-kubernetes-3 1/1 Running 4 19m
kube-proxy-kubernetes-4 1/1 Running 4 18h
kube-scheduler-kubernetes-4 1/1 Running 6 18h
The answer to this question suggest to check if kubectl get node
returns same names that are registered on kubelet. As far as I checked the logs, nodes are registered correctly, and this is the output of kubectl get node
$ kubectl get node
NAME STATUS AGE VERSION
kubernetes-1 Ready 18h v1.6.1+coreos.0
kubernetes-2 Ready 36m v1.6.1+coreos.0
kubernetes-3 Ready 29m v1.6.1+coreos.0
kubernetes-4 Ready,SchedulingDisabled 18h v1.6.1+coreos.0
The tutorial I've used (linked above) suggest I use --hostname-override
but I couldn't get node info on master node (kubernetes-4) If i tried to curl it locally. So I removed it and I can get node info normally now.
Someone suggested it might be a flannel problem and that I should check the flannel ports. Using netstat -lntu
I get the following output:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2379 0.0.0.0:* LISTEN
tcp 0 0 MASTER_IP:2380 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN
tcp6 0 0 :::4194 :::* LISTEN
tcp6 0 0 :::10250 :::* LISTEN
tcp6 0 0 :::10251 :::* LISTEN
tcp6 0 0 :::10252 :::* LISTEN
tcp6 0 0 :::10255 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 :::443 :::* LISTEN
udp 0 0 0.0.0.0:8472 0.0.0.0:*
So I assume the ports are fine?
Also etcd2 works, etcdctl cluster-health
shows that all nodes are healthy
This is the part of cloud-config that starts the etcd2 on reboot, besides that I only store ssh keys and node username/password/groups in it:
#cloud-config
coreos:
etcd2:
name: "kubernetes-4"
initial-advertise-peer-urls: "http://NODE_IP:2380"
listen-peer-urls: "http://NODE_IP:2380"
listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379"
advertise-client-urls: "http://NODE_IP:2379"
initial-cluster-token: "etcd-cluster-1"
initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380"
initial-cluster-state: "new"
units:
- name: etcd2.service
command: start
This is the content of /etc/flannel/options.env
file:
FLANNELD_IFACE=NODE_IP
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379
The same endpoints are under --etcd-servers
in kube-apiserver.yaml
file
Any ideas/suggestion what could be the problem? Also if there are some details missing let me know, I'll add them to the post.
Edit: I forgot to include kube-proxy logs.
Master node kube-proxy log:
$ kubectl logs kube-proxy-kubernetes-4
I0615 07:47:45.250631 1 server.go:225] Using iptables Proxier.
W0615 07:47:45.286923 1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused
W0615 07:47:45.303576 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:45.303593 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:45.303646 1 server.go:249] Tearing down userspace rules.
E0615 07:47:45.357276 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0615 07:47:45.357278 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Worker nodes kube-proxy log:
$ kubectl logs kube-proxy-kubernetes-1
I0615 07:47:33.667025 1 server.go:225] Using iptables Proxier.
W0615 07:47:33.697387 1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused
W0615 07:47:33.712718 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:33.712734 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:33.712773 1 server.go:249] Tearing down userspace rules.
E0615 07:47:33.787122 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
E0615 07:47:33.787144 1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused