I have created a new cloud run service set for 50 max requests per second (concurrency), but in production it's been hovering at a max of 2-3 req/s/container. I know cloud run will aim to keep CPU around 60%, but I've been increasing it from 1 vCPU to 4 vCPU and I'm still not seeing the 1 container I would expect for my load of 0.75 req/s right now. I tried "always allocated" CPU and it didn't reduce the active instance count.
What is going on? Is there any way I can get it to stick to the max I set? It's going to cost hundreds of dollars extra if it keeps scaling like this, since I didn't even turn on all the requests yet.
Alternative Question: Since the costs are only only during request allocation, perhaps I'm not charged and the number of active containers doesn't matter?
PS: This is a headless scraper service, so it's going to be running headless chrome which requires a fair amount of CPU to start up, but every additional tab isn't a substantially increased CPU requirement.
PSS: Also, any recommended tips for keeping the container count low is appreciated: I added a min active instance of 1, but that's about all I considered.