One Instance-One Request at a time App Engine Flexible

Question

I am using

App Engine Flexible, custom runtime.
nodejs, as base Image.
express
Cloud Tasks for queuing the requests
puppeteer job

My Requirements

20GB RAM
long-running process

because of my unique requirement, I want 1 request to be handled by only 1 instance. when it gets free or the request gets timed-out, only then it should get a new request. I have managed to reject other requests while the instance is processing 1 request, but not able to figure out the appropriate automatic scaling settings.

Please suggest the best way to achieve this. Thanks in advance!

What's a long running process (how many minutes/hour)? Why do you use AppEngine flexible? Binaries required? Language not supported in standard? — guillaume blaquiere, Sep 30 '19 at 20:30
My requirement is ~ 20 min per request. AppEngine Flexible because it can provide a 20GB RAM machine & is Managed. Standard, Cloud Functions & even Cloud Run supports max only 2 GB RAM. — Sanyam Jain, Oct 01 '19 at 14:14

Averi Kitsch · Answer 1 · 2019-09-30T16:10:43.920

5

In your app.yaml try restricting the max_instances and max_concurrent_requests.

I also recommend looking into rate limiting your Cloud Tasks queue in order to reduce unnecessary attempts to send requests. Also you may want to increase your MIN_INTERVAL for retry attempts to spread out requests as well.

Your task queue will continue to process and send tasks by the rate you have set, so if your instance rejects the request it will go into a retry pattern. It seems like you're focused on the scaling of App Engine but your issue is with Cloud Tasks. You may want to schedule your tasks so they fire at the interval you want.

edited Sep 30 '19 at 16:10

answered Sep 25 '19 at 20:08

Averi Kitsch

876
5
16

If you haven't already, I'd also recommend taking a look at [`max_idle_instances`](https://cloud.google.com/appengine/docs/standard/nodejs/config/appref#max_idle_instances) and [`min_indle_instances`](https://cloud.google.com/appengine/docs/standard/nodejs/config/appref#min_idle_instances). Setting up those elements correctly can be beneficial for you app as it would be prepared to respond in case of a sudden load spike. – Miguel Sep 26 '19 at 07:18
how would restricting `max_num_instances` help? instead wouldn't it request the same already loaded instances again? Though `max_concurrent_requests` is not mentioned here: https://cloud.google.com/appengine/docs/flexible/nodejs/reference/app-yaml But I've tried it and concluded that it doesn't help – Sanyam Jain Sep 26 '19 at 14:50
I'm guessing the "max_num_instances" recommendation is so you don't scale on too many instances (given that you only handle 1 request per instance). the "max_cocurrent_requests" is not mentioned there but it is mentioned here -> https://cloud.google.com/appengine/docs/flexible/nodejs/how-requests-are-handled#handling_requests , So it should work on Flex too. Can you be more specific on why it doesn't help? what is the behavior it has and what were you expecting to happen?. – Mayeru Sep 30 '19 at 12:01
Though your focused on App Engine settings it seems like you may want to look into scheduling your tasks for them to fire at set times (aka longer intervals if you know approximately how long it takes to process each task). I've updated my answer to reflect this. – Averi Kitsch Sep 30 '19 at 16:12
@Mayeru `max_concurrent_requests` doesn't help because the app engine sends doesn't wait for an instance to finish the job & sends the second request to the instance that is already processing the first request. @AveriKitsch suppose my task takes 10 minutes, I can schedule tasks at an 11-minute interval rate, but then it won't parallelize the requests. It will be a sequential queue. What I am looking for is a parallel managed solution :) – Sanyam Jain Oct 01 '19 at 14:21
I see, have you considered Jofre's approach posted on the other answer? (it uses an App Engine settings focused approach, which seems to be what you are looking for). – Mayeru Oct 02 '19 at 15:24

score 4 · Answer 2 · answered Sep 30 '19 at 16:44

You could set readiness checks on your app.

When an instance is handling a request, set the readiness check to return a non-ready status. 429 (too many requests) seems like a good option.

This should avoid traffic to that specific instance.

Once the request is finished, return a 200 from the readiness endpoint to signal that the instance is ready to accept a new request.

However, I'm not sure how will this work with auto-scaling options. Since the app will only scale up once the average CPU is over the threshold defined, if all instances are occupied but do not reach that threshold, the load balancer won't know where to route requests (no instances are ready), and it won't scale up.

You could play around a little bit with this idea and manual scaling, or by programatically changing min_instances (in automatic scaling) through the GAE admin API.

Be sure to always return a 200 for the liveness check, or the instance will be killed as it will be considered unhealthy.

One Instance-One Request at a time App Engine Flexible

2 Answers2