App engine automatic_scaling configuration

Question

I'm trying to reduce my Google App Engine bill by setting the automatic_scaling parameters. In average, my app has between 7-10 instances running, where 2 or 3 of them are idle. But sometimes, like between 3 and 6am in the chart attached, the difference between active and idle instances is ridiculously large. Also, I want to reduce the number of active instances increasing the response time for the final user (setting min_pending_latency and max_pending_latency). But, until now, none of these settings are doing any effect.

This is my app.yaml configuration:

automatic_scaling:
  min_pending_latency: 250ms
  max_pending_latency: 750ms
  max_idle_instances: 2

score 6 · Answer 1 · answered Oct 03 '15 at 19:02

Setting both min_pending_latency and max_pending_latency sends mixed messages to the autoscaler.

More generally, you can tweak the autoscaler to either contain your costs (set a low value for max_idle_instances and/or a high one for min_pending_latency), or to improve your scalability -- that is, keep latency low for surges of traffic (set a high value for min_idle_instances and/or a low one for max_pending_latency).

Don't mix the two kinds of tweaks -- such "mixed messages", in my experience, never result in good effects on either costs, or latency during a surge.

And yes, I am working to have this fundamental bit of information become part of Google Cloud Platform's official docs -- it's just taking longer than I hoped, which is why, meanwhile, I am posting this answer.

A more advanced alternative, if you're very certain about your patterns of traffic over time, possibilities of surges, and so forth, is to switch from auto-scaled modules to basic-scaled or even manual-scaled ones, writing your own code to start and terminate instances via the Modules API.

Alhough, I have to admit, this never worked optimally for me, for modules dedicated to serving user traffic (as opposed to task-queues or cron-based "backend" work) -- my users' surges and time patterns never were as predictable going forward, as analyzing past records tantalizingly suggested. So, in the end, I always went back (for user traffic servicing) to good old auto-scaling, perhaps with the modest tweaks either to reduce costs, or to improve scalability, as I recommend above.

So, do you think this is a good configuration to contain costs? `automatic_scaling: min_pending_latency: 500ms max_idle_instances: 2` — Javier Marín, Oct 03 '15 at 19:52
@JavierMarín, yes, I believe it would likely reduce costs compared to default `auto` scaling (depending on the shape and frequency of your surges in use). — Alex Martelli, Oct 08 '15 at 19:32

App engine automatic_scaling configuration

1 Answers1