We recently experienced an issue where our master Dataproc instance rebooted and some of our services didn't start up cleanly; we're not sure what triggered this reboot but the logs suggested that it was GCP maintenance. Although we could likely use Stackdriver Monitoring to catch and act on these types of events, it raised the question of whether GCP has a service that could be used to notify us of maintenance ahead of or at the time maintenance actions are taken. Any tips would be appreciated!
3 Answers
GCP does not reboot VMs for scheduled maintenance. Instead the VM is live migrated to avoid reboots and downtime for your VM during maintenance.
Compute Engine offers live migration to keep your virtual machine instances running even when a host system event occurs, such as a software or hardware update. Compute Engine live migrates your running instances to another host in the same zone rather than requiring your VMs to be rebooted. This allows Google to perform maintenance that is integral to keeping infrastructure protected and reliable without interrupting any of your VMs. [source]
But if the hardware your VM is running on fails your VM may experience a reboot.

- 30,455
- 17
- 76
- 124
As kasperd says, in general VMs are (optionally) live-migrated on maintenance and maybe you observed a hardware failure, but there are exceptions including instances with GPU accelerators as documented here: GCP Maintenance Events
Google does provide a way to get notice on pending maintenance events by polling the URL like this: curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google"
--- a reply of NONE indicates no event pending or in progress.
They also provide a framework for a wrapper script in Python that avoids repeatedly polling this url: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/compute/metadata/main.py

- 38
- 4
-
Is that API supposed to cover all maintenance events or only maintenance events where it's known in advance that live migration or VM reboot is inevitable? – kasperd Feb 13 '19 at 22:01
How to take control on these maintenance events? I want Google to perform this maintenance events on non business hours of my project i.e, Saturday and Sunday. Is there any option for us to set a maintenance schedule on our own?

- 1
-
If you have a new question, please ask it by clicking the [Ask Question](https://serverfault.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/546698) – fission Mar 24 '23 at 07:07