0

We are employing a GPU instance in google cloud to run a machine learning related service. Google cloud has these un-scheduled maintenance plans for their GPU instances. Before the instance is down for maintenance, google cloud api would send notification to this instance 1 hour prior to the instance being turned down.

Suppose the GPU instance is named "vm1" We want to spin up a backup GPU instance, "vm1-duplicate", once "vm1" receives the notice that it will be going thru maintenance later, so that the service to our clients would be affected by this maintenance. And once the maintenance on "vm1" is completed, stop "vm1-duplicate" so that there is no extra costs.

Is there a elegant way in google cloud to program such an automatic vm starting/stopping based on some conditions/triggers/events?

1 Answers1

2

Super quick solution: install gcloud in vm (should be already there), create a service account [1], create a small shell script and run it in cron every 15/30 minutes.

The script will monitor the event http endpoint [2] and will respawn a new vm from within your vm via gcloud cmd line. Stop autorestart and simply keep the new vm up and running.

If it works consider to derive your own image with the script already in place and cron configured. So respawn will be easier.

[1] not mandatory, just to avoid put your credentials in the vm

[2] https://cloud.google.com/compute/docs/gpus/gpu-host-maintenance

matteo nunziati
  • 664
  • 1
  • 4
  • 13