Script to create multiple GCE VMs simultaneously

Question

I have a basic SH script that I use to create multiple VMs on GCP, and it works fine, but sequentially. When a number of VMs is say above 4 or 5, it becomes a material delay of time. I noticed that in platforms like Dataflow or Dataproc, an arbitrary number of VMs gets created virtually simultaneously. Is there a way to mimic that functionality in GCE? (after all, these seem to be basic GCE machines anyway).

Right now I use the following (simplified) script:

vms=4
for i in `seq 1 $vms`
do

    gcloud compute --project PROJECT disks create VM"$i" --size 50 --zone ZONE --type "pd-ssd"
    gcloud beta compute --project=PROJECT instances create VM"$i" --zone=ZONE --machine-type=MACHINE_TYPE --subnet=default --maintenance-policy=MIGRATE --scopes=https://www.googleapis.com/auth/cloud-platform --disk=name=VM"$i",device-name=VM"$i",mode=rw,boot=yes,auto-delete=yes

done

Thanks for any suggestions!

score 4 · Accepted Answer · answered Aug 23 '18 at 05:15

You can create multiple similar VMs faster by creating a group of managed VMs.

First, create an instance template, specifying the VM configuration that you need:

gcloud compute instance-templates create TEMPLATE_NAME \
  --machine-type MACHINE_TYPE \
  --image-project IMAGE_PROJECT \  # project where your boot disk image is stored
  --image IMAGE \  # boot disk image name
  --boot-disk-type pd-ssd \
  --boot-disk-size 50GB \
  --boot-disk-auto-delete \
  --boot-disk-device-name DEVICE_NAME \  # boot disk device name, the same for all VMs
  --subnet default \ 
  --maintenance-policy MIGRATE \
  [...]

Note:

You specify boot disk as part of instance template.
No need to specify zone for instance template. You will specify desired zone at instance group creation time.
Device name is the same for boot disks of all VMs in the group. This is not a conflict because device name of a particular disk is seen from guest OS of each specific VM and is local to that VM.
Other parameters are the same as those for creating a VM.

Then, create a group of 4 (or 100, or 1000+) VMs, based on this template:

gcloud compute instance-groups managed create GROUP_NAME \
  --zone ZONE \
  --template TEMPLATE_NAME \ # name of the instance template that you have just created
  --size 4 \ number of VMs that you need to create

The group creates multiple similar VMs, based on your template, much faster than you would do it by iterating creation of standalone VMs.

Thanks this works really well. The only caveat here compared with the script is that there seems to be no way of controlling the exact host names of the machines (they all end with a random suffix). Also unlike with the script, this seems to work based on an existing Image rather than a Snapshot. But it works really fast, so these are very minor issues. — VS_FF, Aug 23 '18 at 09:28

score 1 · Answer 2 · answered Aug 22 '18 at 16:12

1

Direct way

A quick win would be to add the --async parameters to the gcloud command. Moreover, you can add parallelization in bash with wait and &:

for i in `seq 1 4`
do
  gcloud compute instances [...] --async &
done
wait

Alternative

You can use terraform to do it in a different way

answered Aug 22 '18 at 16:12

Yann C.

1,315
12
17

Thanks for the async suggestion -- that sounds great. In that case, how would I coordinate the creation of the hard drive and the VM in each iteration of the loop? Or I suppose I could create the drives async first and then create the VMs based on those drives asynch as well? (i'm clueless in bash TBH) – VS_FF Aug 22 '18 at 16:24
Yes, I would create the persistent disks first, in a dedicated loop without `--async` parameter but with `&` and `wait` for paralleled creation. The second loop can begin only if disks are ready to be used. Be careful of gcloud errors, many calls can fail (quota, permission, bad requests, etc.) – Yann C. Aug 23 '18 at 06:29
Thanks for the suggestions. Accepting the other answer, because it seems much more GCP-native. – VS_FF Aug 23 '18 at 09:25
Indeed, it depends of what you want to do. For example, if you want stable instance names (for simple discovery purpose or whatever), you can't use managed instance group. Even worse, additional persistent disks are not supported with instance templates AFAIK. But if you do not need these features, it is much cleaner to use a managed instance group. – Yann C. Aug 23 '18 at 10:16

score 0 · Answer 3 · answered May 18 '21 at 22:55

The bulk instance creation APIs allow you to create many VMs with a single API request:

gcloud compute instances bulk create \
  --name-pattern="VM#" \
  --count=4 \
  --region=REGION \
  --machine-type=MACHINE_TYPE \
  --image-project=IMAGE_PROJECT \  # project where your boot disk image is stored
  --image=IMAGE \  # boot disk image name
  --boot-disk-type=pd-ssd \
  --boot-disk-size=50GB \
  --boot-disk-auto-delete \
  --boot-disk-device-name=DEVICE_NAME \  # boot disk device name, the same for all VMs
  --subnet=default \ 
  --maintenance-policy=MIGRATE \
  [...]

Note that:

All VMs are created in parallel
It automatically chooses the zone in which to create VMs, based on where there's availability
It will fail upfront if it detects some issue which prevents creation of the full request (e.g., if there's not enough capacity)
It can automatically generate names for you

Script to create multiple GCE VMs simultaneously

3 Answers3