0

I have created a new cloud run service set for 50 max requests per second (concurrency), but in production it's been hovering at a max of 2-3 req/s/container. I know cloud run will aim to keep CPU around 60%, but I've been increasing it from 1 vCPU to 4 vCPU and I'm still not seeing the 1 container I would expect for my load of 0.75 req/s right now. I tried "always allocated" CPU and it didn't reduce the active instance count.

What is going on? Is there any way I can get it to stick to the max I set? It's going to cost hundreds of dollars extra if it keeps scaling like this, since I didn't even turn on all the requests yet.

Alternative Question: Since the costs are only only during request allocation, perhaps I'm not charged and the number of active containers doesn't matter?

PS: This is a headless scraper service, so it's going to be running headless chrome which requires a fair amount of CPU to start up, but every additional tab isn't a substantially increased CPU requirement.

PSS: Also, any recommended tips for keeping the container count low is appreciated: I added a min active instance of 1, but that's about all I considered.

enter image description here

Kevin Danikowski
  • 4,620
  • 6
  • 41
  • 75

1 Answers1

3

There are several factors that may affect Cloud Run concurrency.

  • Maximum concurrent requests per instance (services)

    • Cloud Run provides a maximum concurrent requests per instance setting that specifies the maximum number of requests that can be processed simultaneously by a given container instance.

    • By default each Cloud Run container instance can receive up to 80 requests at the same time; you can increase this to a maximum of 1000. You can lower the maximum concurrency if needed, e.g., your code cannot handle parallel requests, set concurrency to 1.

  • CPU allocation (services)

    • Cloud Run will only scale out when CPU utilization during request processing exceeds 60%.

    • If you select CPU always allocated and perform background activities without requests, Cloud Run will not scale out even if CPU usage is over the 60% threshold and in some cases, a container instance might become too busy to accept incoming requests.

  • Maximum number of container instances (services)

    • Since you've already mentioned that you've set your minimum instance to 1, it would also a best practice to set a maximum instance so that Cloud Run allows you to limit the scaling of your service in response to incoming requests, although this maximum setting can be exceeded for a brief period due to circumstances such as traffic spikes.

These are just some of the factors that could affect concurrency. Hope this helps.

Robert G
  • 1,583
  • 3
  • 13