GCE autoscaling based on custom metrics provided by the running instances?

Question

I have a managed instance group with autoscaling.

Each instance runs a program that polls a remote site asking "do you have any work for me?" If the answer is "no", it goes to sleep for a few seconds and repeats. We consider this an idle instance. Otherwise, it receives instructions for what to do, marks itself busy, does what it has to do (it can take anywhere from a few minutes to a few hours), returns the results, and goes back to being idle.

I want the autoscaler to make sure there always is at least one idle instance, so it can pick up any available work. It cannot be CPU-based, as the jobs can spend significant amounts of time not really using much CPU, or they may not have enough parallelism to use all cores, and so on.

If it were possible for the autoscaler to scrape an arbitrary metrics server for a particular metric, life would be simple: Each instance is already running a Prometheus node exporter, so it can export a metric such as is_busy, set to 0 or 1, which Prometheus can then aggregate, add 1 to the sum, and export as the metric that the autoscaler could use. But this is not possible.

Google's documentation of how to use Prometheus metrics for the autoscaler, even though it is linked to from the GCE MIG page, only talks about how to do it for Kubernetes autoscaling, which, of course, is not what I am using here.

I have already thought about having instances create a custom stackdriver metric, which they update, but if the instance dies before updating its metric to 0, it will never be reset, so the autoscaler will not know about it.

This cannot possibly be very hard or very uncommon (either that, or I cannot think of the right terms to search for :( ). Any suggestions?

Accordingly to the documentation [Scaling based on Cloud Monitoring metrics](https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics) "You can create [custom metrics](https://cloud.google.com/monitoring/custom-metrics) using Cloud Monitoring and write your own monitoring data to the Monitoring service." and "create an autoscaler that uses Cloud Monitoring metrics". — Serhii Rohoza, Dec 02 '20 at 08:11
Also, accordingly to [Autoscaling groups of instances](https://cloud.google.com/compute/docs/autoscaler) you can use [multiple policies](https://cloud.google.com/compute/docs/autoscaler/multiple-policies) for auto scaling MIG. — Serhii Rohoza, Dec 02 '20 at 08:11
For example this quicklab [Autoscaling an Instance Group with Custom Cloud Monitoring Metrics](https://www.qwiklabs.com/focuses/611?parent=catalog). — Serhii Rohoza, Dec 02 '20 at 08:16

GCE autoscaling based on custom metrics provided by the running instances?

0 Answers0