I have a managed instance group with autoscaling.
Each instance runs a program that polls a remote site asking "do you have any work for me?" If the answer is "no", it goes to sleep for a few seconds and repeats. We consider this an idle instance. Otherwise, it receives instructions for what to do, marks itself busy, does what it has to do (it can take anywhere from a few minutes to a few hours), returns the results, and goes back to being idle.
I want the autoscaler to make sure there always is at least one idle instance, so it can pick up any available work. It cannot be CPU-based, as the jobs can spend significant amounts of time not really using much CPU, or they may not have enough parallelism to use all cores, and so on.
If it were possible for the autoscaler to scrape an arbitrary metrics server for a particular metric, life would be simple: Each instance is already running a Prometheus node exporter, so it can export a metric such as is_busy, set to 0 or 1, which Prometheus can then aggregate, add 1 to the sum, and export as the metric that the autoscaler could use. But this is not possible.
Google's documentation of how to use Prometheus metrics for the autoscaler, even though it is linked to from the GCE MIG page, only talks about how to do it for Kubernetes autoscaling, which, of course, is not what I am using here.
I have already thought about having instances create a custom stackdriver metric, which they update, but if the instance dies before updating its metric to 0, it will never be reset, so the autoscaler will not know about it.
This cannot possibly be very hard or very uncommon (either that, or I cannot think of the right terms to search for :( ). Any suggestions?