Connection bias in Cloud Run using http/2, GCLB
GCLB , Serverless NEG, CloudRun(http/2) is used to configure the backend. This is a chart that records access counts per container by our own log-based metrics.
Clearly, the containers appear to be grouped. And there appears to be a large bias in the number of requests between one group and another. (Not enough requests are being allocated to the container groups that have just been launched.) This was not seen when http1 was used in the connection between GCLB and CloudRun.
During this time, application logs are normal(few 503), so there is no application layer limiting requests. (The metrics are based on the standard request log, so the only metric is whether or not the request was made.)
How should this be remedied?
CloudRun Setting is as below
metadata:
name: app
annotations:
run.googleapis.com/client-name: gcloud
run.googleapis.com/client-version: 411.0.0
autoscaling.knative.dev/minScale: '4'
run.googleapis.com/execution-environment: gen2
autoscaling.knative.dev/maxScale: '200'
run.googleapis.com/cpu-throttling: 'false'
run.googleapis.com/startup-cpu-boost: 'true'
spec:
containerConcurrency: 100
timeoutSeconds: 300
containers:
ports:
- name: h2c
containerPort: 8080
I have tried changing the number of containerConcurrency, but this does not improve things much. There is still an imbalance, as it appears that requests are allocated to containers that have just been launched or not allocated at all.