2

I have two GKE clusters, both running 1.20.8-gke.900 for Control Plane as well as nodes, and using preemptible nodes.

According to GKE documentation

On preemptible GKE nodes running versions 1.20 or later, the kubelet graceful node shutdown feature is enabled by default. As a result, kubelet detects preemption and gracefully terminates Pods.

However, on one of these clusters when doing kubectl get pods --all-namespaces I see many pods in SHUTDOWN state, such as

kube-system      metrics-server-v0.3.6-9c5bbf784-w8cs2                       0/2     Shutdown   0          2d20h

On the other one, there are no such pods, although both have the same config as far as I know (the only difference is that the one without those pods is a private cluster, the other public, but this should not make no difference?)

Is there any setting or config difference that I have missed?

user140547
  • 7,750
  • 3
  • 28
  • 80
  • Can you describe one of the shutdown's pod and add the events so we can take a look? – CaioT Aug 09 '21 at 15:09
  • 1
    @CaioT Well AFAIK the cluster with the pods in shutdown state is behaving as expected according to the documentation, see also https://stackoverflow.com/questions/68687716/k8s-pods-stuck-in-failed-shutdown-state-after-preemption-gke-v1-20 The cluster which does not have such pods is not behaving as I expect. – user140547 Aug 09 '21 at 15:33
  • Are you 100% sure that both clusters use preepmtible nodes and the only difference is that one of them is private and the other public ? – mario Aug 10 '21 at 20:25
  • well I am. Strangely enough, now I started getting pods in Shutdown state on the other cluster as well but there were no changes. So I guess this cant be reproduced. Maybe some timing issue, although the nodes were upgraded to 1.20 on August 1 respectively 2 (as part of Regular release channel) – user140547 Aug 11 '21 at 07:53
  • Hi @user140547, any progress? Please note that "Graceful node shutdown" feature is in alpha in Kubernetes `v1.20`. This may explain random behaviour you are experiencing. In `v1.21` version it is in beta. Source: [Kubernetes blog - Graceful Node Shutdown Goes Beta](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/), [Kubernetes v1.20 changelog](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md) – Mikolaj S. Aug 24 '21 at 14:21
  • For the time being it is working. It is true that it is Alpha, however it is activated by default on preemptible GKE 1.20 nodes https://cloud.google.com/kubernetes-engine/docs/release-notes#May_03_2021 – user140547 Aug 24 '21 at 15:42

1 Answers1

0

Posted community wiki based on comments for better visibility.

For now, the issue with pods being in constant SHUTDOWN state on the private cluster is gone. Worth to note, that the issue occurred later also on the public cluster.

A possible cause of unexpected and random behavior is that feature "Graceful node shutdown" is in alpha state in Kubernetes v.1.20. In Kubernetes v.1.21 it is in beta state.

Mikolaj S.
  • 2,850
  • 1
  • 5
  • 17