We have a streaming pipeline that we have enabled autoscaling on. Generally, one worker is enough to process the incoming data, but we want to automatically increase the number of workers if there is a backlog.
Our pipeline reads from Pubsub, and writes batches to BigQuery using load jobs every 3 minutes. We ran this pipeline starting with one worker, publishing twice as much data to pubsub as one worker could consume. After 2 hours, autoscaling had still not kicked in, so the backlog would have been about 1 hour's worth of data. This seems rather poor given that autoscaling aims to keep the backlog under 10 seconds (according to this SO answer).
The document here says that autoscaling for streaming jobs is in beta, and is known to be course-grained if the sinks are high-latency. And yeah, I guess doing BigQuery batches every 3 minutes counts as high-latency! Is there any progress being made on improving this autoscaling algorithm?
Are there any work-arounds we can do in the meantime, such as measuring throughput at a different point in the pipeline? I couldn't find any documentation on how the throughput gets reported to the autoscaling system.