GCE micro instance-group autoscaling anomalies

Question

I have a managed instance-group in a project in Google Cloud Platform. My instance group is using the smallest predetermined machines that GCP offers, the f1-micro (more info here: https://cloud.google.com/compute/docs/machine-types#sharedcore)

I have autoscaling enabled on my instance group with these settings:

gcloud compute instance-groups managed set-autoscaling [my-ig] \
--region us-central1 \
--min-num-replicas=3 \
--max-num-replicas=15 \
--cool-down-period=250 \
--scale-based-on-cpu \
--target-cpu-utilization=0.9

I have some strange behaviour where after some small/short peaks in cpu usage the autoscaler decides to massively autoscale my instances, just to then go back to the original number a few minutes later.

This is how a cpu-graph of my instance group looks like, in this screenshot the instance group had no autoscaling and it has 3 instances running my app:

To me, those instances don't look like they need to autoscale, they seem stable with the power they have, and in practice, the website does perform very good.

This is what google says about that type of vm instance:

f1-micro machine types offer bursting capabilities that allow instances to use additional physical CPU for short periods of time. Bursting happens automatically when your instance requires more physical CPU than originally allocated. During these spikes, your instance will opportunistically take advantage of available physical CPU in bursts

My problem is:

Are those spikes in that graph normal, given that each vm instance has 0.2 shared cpus? Or those spikes should not be there even though the machine is so small?
With autoscaling on, the autoscaler starts adding instances like crazy on each rising edge of the cpu activity, when in reality if you average the cpu, there were no real spikes in cpu usage, just small bursts that quickly stabilized.

My options (I think) are:

use less instances but of a larger size
~~use some stackdriver graph that averages cpu by 10minute average~~ (too expensive)
disable autoscaling and do it manually
fix the cpu spikes in my code (if possible, in case it is not a normal behaviour of micro VMs)

score 1 · Accepted Answer · answered Jun 03 '17 at 16:58

You might be running into a combination of both bursting capability of f1-micro instance class (which can send instance CPU utilization over 100%) and how Autoscaler handles high CPU load.

During periods of heavy CPU utilization, if utilization reaches close to 100%, the autoscaler estimates that the group may already be heavily overloaded. In these cases, the autoscaler increases the number of virtual machines by at least an extra 50% or a minimum of 4 instances, whichever is higher. In general, CPU utilization within a managed instance group will not exceed 100%.

I think you've outlined your options pretty well. I would recommend to check if you can manage with no autoscaling at all.

If your application load distribution correlates with diurnal rhythm (no traffic at night, high load during the day) you might adjust Instance Group size semi-automatically (think calling GCE API/gcloud from cron).

Yes, thank you very much, I really needed some validation from someone else. I have gone with no-autoscaling and will make a setup with a cron like you suggested. — santiago arizti, Jun 05 '17 at 15:39

GCE micro instance-group autoscaling anomalies

1 Answers1