We have a GKE cluster with auto-upgrading nodes. We recently noticed a node become unschedulable and eventually deleted that we suspect was being upgraded automatically for us. Is there a way to confirm (or otherwise) in Stackdriver that this was indeed the cause what was happening?
-
1not sure, but it should be doing a `cordon` and drain. In which case the kubelet would produce the below if stackdriver is scraping that. kubelet[1319]: I0624 18:41:04.771532 1319 kubelet_node_status.go:447] Recording NodeNotSchedulable event message for node gke-squareroute-default-pool-9f095a99-s6z9 – eamon1234 Jun 24 '19 at 18:43
-
@eamon1234 Thanks -- yes, we do get logs with `NodeNotSchedulable` popping up at that time. I guess that's not entirely sufficient to know that it was caused by the automatic node upgrade (e.g. someone could be doing it manually), but it gets us a good way towards it. – Matt R Jun 25 '19 at 12:56
-
2Normally you should see the upgrade as a node pool operation but there is currently an issue where the logs are not being created during this operation. They still appear if you manually upgrade the node pool – Patrick W Jun 25 '19 at 14:05
3 Answers
You can use the following advanced logs queries with Cloud Logging (previously Stackdriver) to detect upgrades to node pools:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_nodepool"
and master:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_cluster"
Additionally, you can control when the update are applied with Maintenance Windows (like the user aurelius mentioned).
-
1Thank you. Could you query for termination of a particular node being due to this upgrade event? – Traveler Sep 07 '21 at 17:26
I know it's not Cloud Logging, but another method to list the auto-upgrade operations is with gcloud. In Cloud Logging I could only find the completion of the upgrade, not the start.
gcloud container operations list

- 1
- 3
I think your question has been already answered in the comments. Just as addition automatic upgrades occur at regular intervals at the discretion of the GKE team. To get more control you can create a Maintenance Windows as explained here. This is basically a time frame that you choose in which automatic upgrades should occur.

- 3,433
- 1
- 13
- 22