calico-etcd not scheduled on GKE 1.11 k8s

Question

I recently upgraded my GKE cluster from 1.10.x to 1.11.x and since then my calico-node pods fail to connect to the etcd cluster and end up in a CrashLoopBackOff due to livenessProbe error.

I saw that the calico-etcd DaemonSet has desired state 0 and was wondering about that. nodeSelector is at node-role.kubernetes.io/master=.

From the logs of such calico-nodes:

2018-12-19 19:18:28.989 [INFO][7] etcd.go 373: Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout

2018-12-19 19:18:28.989 [INFO][7] startup.go 254: Unable to query node configuration Name="gke-brokerme-ubuntu-pool-852d0318-j5ft" error=client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout

State of the DaemonSets:

NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                  AGE
calico-etcd                0         0         0       0            0           node-role.kubernetes.io/master=                3d
calico-node                2         2         0       2            0           <none>                                         3d

k get nodes --show-labels:

NAME                                     STATUS   ROLES    AGE   VERSION         LABELS
gke-brokerme-ubuntu-pool-852d0318-7v4m   Ready    <none>   4d    v1.11.5-gke.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=ubuntu-pool,cloud.google.com/gke-os-distribution=ubuntu,failure-domain.beta.kubernetes.io/region=europe-west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=gke-brokerme-ubuntu-pool-852d0318-7v4m,os=ubuntu
gke-brokerme-ubuntu-pool-852d0318-j5ft   Ready    <none>   1h    v1.11.5-gke.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=ubuntu-pool,cloud.google.com/gke-os-distribution=ubuntu,failure-domain.beta.kubernetes.io/region=europe-west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=gke-brokerme-ubuntu-pool-852d0318-j5ft,os=ubuntu

I did not modify any calico manifests, they should be 1:1 provisioned by GKE.

I would expect either the calico-nodes connect to the etc of my Kubernetes cluster, or to a calico-etcd provisioned by the DaemonSet. As there is no master node that I can control in GKE, I kind of get why calico-etcd is at state 0, but then, to which etc are the calico-nodes supposed to connect? What's wrong with my small and basic setup?

score 0 · Answer 1 · answered Dec 20 '18 at 17:51

0

We are aware of the issue of calico crash looping in GKE 1.11.x. You can fix this issue, by upgrading to newer versions. , I would recommend you to upgrade to the version '1.11.4-gke.12' or '1.11.3-gke.23' which does not have this issue.

answered Dec 20 '18 at 17:51

John Mathew

419
2
5

I am currently on the latest version, which is `v1.11.5-gke.5`. As I anticipated it might be a GKE issue, I waited for at least one upgrade before posting here. However, the issue still remains, unfortunately. – SoJeN Dec 21 '18 at 15:47
I am sharing this [public tracker](https://issuetracker.google.com/120255782) with you to see how other people have resolved this issue. As you see in the link, upgrading masters to 1.11.5-gke.4 should resolve Calico issue. If you are still affected, you can report a bug using [public issue](https://cloud.google.com/support/docs/issue-trackers) via the section "Compute", "Google Kubernetes Engine issues" with your specific steps to reproduce. – mehdi sharifi Dec 28 '18 at 23:45

calico-etcd not scheduled on GKE 1.11 k8s

1 Answers1