I have a web application. The cold start time of the backend service is about 10 second which is very high. I was not able to reduce the cold start time. As a second solution, I am wondering if can requests that makes cloud run service scale up handled by already running instances. After the new scaled containers ready, new requests will be handled by scaled up containers. Does Google Cloud support that?
2 Answers
You have a brand new feature for that. It's Health Probe you can put on your service to detect when the instance is ready to serve traffic, or unhealthy and no new request will be routed to it.
Have a try on it, it should solve your issue!

- 66,369
- 2
- 47
- 76
-
My health probes are not the issue. My cold start is really 10 seconds with health checks. What I mean is, forward the requests that causes the scale up, to already ready containers until the new containers start (10 seconds to start). After the container is healthy-ready to get requests, it can get new requests. – Burak Berk Feb 03 '23 at 14:28
-
I didn't catch your comment. Can you rephrase please? – guillaume blaquiere Feb 03 '23 at 20:13
-
Is it possible on Google Cloud Run for new requests that cause the service to scale up to be handled by already running instances instead of newly scaled instances, to reduce the 10 second cold start time? After the new containers start, they will be able to handle requests. – Burak Berk Feb 04 '23 at 21:04
-
Did you try to configure a startup probes? https://cloud.google.com/run/docs/configuring/healthchecks#tcp-startup-probe – guillaume blaquiere Feb 05 '23 at 10:54
As a second solution, I am wondering if can requests that makes cloud run service scale up handled by already running instances.
I think what you really want is min-instances. This means you always will have an instance that is ready to serve requests.
Otherwise, I don't think there is any solution that would solve the problem that you have. If new requests come in, you are going to need to scale up either way, and there is nothing around the 10 second cold start. So implement min-instances with a base-line that is appropriate for your traffic.

- 482
- 2
- 6
- 20
-
Yes, min instances are working for the first requests and container. Users don't have to wait for 10 seconds (cold start). But for the second container, users have to wait for 10 seconds.I asked for if their request could be handled by the first ( already deployed and healthy ) container. After the second container is healthy and ready for requests, new requests will be forwarded to the second container and so on. – Burak Berk Feb 14 '23 at 06:16
-
2I thought that's how it works already unless you are hitting the max concurrency limit on the first container already then there is no option but to hold the request until the 2nd container is ready. Regardless, I think you have to weigh the options between having a higher count of min-instances or potentially more cold-starts. No matter how you design it, you are going to bottle neck the first container and have to wait for the second container in either case. Unless Cloud Run would know how exactly fast your traffic is going to scale up or down which is near impossible. – cvu Feb 14 '23 at 17:45