Why is cloud run's container count so much higher than they should be?

Question

I have created a new cloud run service set for 50 max requests per second (concurrency), but in production it's been hovering at a max of 2-3 req/s/container. I know cloud run will aim to keep CPU around 60%, but I've been increasing it from 1 vCPU to 4 vCPU and I'm still not seeing the 1 container I would expect for my load of 0.75 req/s right now. I tried "always allocated" CPU and it didn't reduce the active instance count.

What is going on? Is there any way I can get it to stick to the max I set? It's going to cost hundreds of dollars extra if it keeps scaling like this, since I didn't even turn on all the requests yet.

Alternative Question: Since the costs are only only during request allocation, perhaps I'm not charged and the number of active containers doesn't matter?

PS: This is a headless scraper service, so it's going to be running headless chrome which requires a fair amount of CPU to start up, but every additional tab isn't a substantially increased CPU requirement.

PSS: Also, any recommended tips for keeping the container count low is appreciated: I added a min active instance of 1, but that's about all I considered.

score 3 · Answer 1 · answered Apr 21 '23 at 18:38

There are several factors that may affect Cloud Run concurrency.

Maximum concurrent requests per instance (services)
- Cloud Run provides a maximum concurrent requests per instance setting that specifies the maximum number of requests that can be processed simultaneously by a given container instance.
- By default each Cloud Run container instance can receive up to 80 requests at the same time; you can increase this to a maximum of 1000. You can lower the maximum concurrency if needed, e.g., your code cannot handle parallel requests, set concurrency to 1.
CPU allocation (services)
- Cloud Run will only scale out when CPU utilization during request processing exceeds 60%.
- If you select CPU always allocated and perform background activities without requests, Cloud Run will not scale out even if CPU usage is over the 60% threshold and in some cases, a container instance might become too busy to accept incoming requests.
Maximum number of container instances (services)
- Since you've already mentioned that you've set your minimum instance to 1, it would also a best practice to set a maximum instance so that Cloud Run allows you to limit the scaling of your service in response to incoming requests, although this maximum setting can be exceeded for a brief period due to circumstances such as traffic spikes.

These are just some of the factors that could affect concurrency. Hope this helps.

Why is cloud run's container count so much higher than they should be?

1 Answers1