Handling signal: term (gunicorn, eventlet, Flask, Google App Engine)

Question

I am running a Flask app on Google App Engine using Gunicorn's async workers.

Every time requests come in, after the last request is finished responding, I get the following message and my gunicorn workers exit. Then, theres a slight delay when the next batch of requests come in.

2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [7] [INFO] Handling signal: term
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [7] [INFO] Handling signal: term
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [21] [INFO] Worker exiting (pid: 21)
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [20] [INFO] Worker exiting (pid: 20)
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [18] [INFO] Worker exiting (pid: 18)
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [14] [INFO] Worker exiting (pid: 14)
2020-05-17 16:57:14 default[20200517t125405]  [2020-05-17 16:57:14 +0000] [19] [INFO] Worker exiting (pid: 19)

Here is my app.yaml

runtime: python37
entrypoint: gunicorn --worker-class eventlet -c gunicorn.conf.py -b :$PORT main:app preload_app=True
instance_class: F2

Here is my gunicorn.conf.py file

import multiprocessing

workers = (multiprocessing.cpu_count()) * 2 + 1
threads = workers # originally didn't have this, just had the workers var defined, but tried this and it also didn't solve the problem

I tried searching SO and some other sources but can't find a workaround for this.

master is issuing TERM signal to the workers, what process is being done by workers? It could be that worker timeout could be less than the time worker takes to process a request. — Saurabh, May 19 '20 at 17:33
@Saurabh My flask app is responsible for fetching data from a 3rd-party API, so it does take an indeterminate amount of time. However, I set the -t timeout in my app.yaml for gunicorn to 90 seconds and I get this message even when the request finishes in well < 30 seconds... so I'm not sure if worker timeout < worker time to fulfill request is necessarily the issue. I'm not really even sure what TERM signal to workers means though? Could it be that new instances are being spun up and App Engine sends the term signal to shut down instances as traffic dies down? Documentation for GAE is unclear. — Rishi Prasad, May 19 '20 at 18:21
These are the posix signals, you can read more about it [here](https://en.wikipedia.org/wiki/Signal_(IPC)#List_of_signals) and the documentation is well maintained and informed! Do you mind sharing your configuration? What is the value of `--max-requests`? Can you also try without using threads? Also can you try bumping up timeout to 120? -- Also I see that preload is set, I hope it's not the cause of worker restart. — Saurabh, May 19 '20 at 18:58
@Saurabh The config files (app.yaml, gunicorn.conf.py) are included in the original post, unless you are speaking about a different file? I don't have a value set for --max-requests on the gunicorn side, so it is currently set to the default value of 0. However, App Engine has a flag called "max_concurrent_requests" which defaults to 10. I've removed the gunicorn.conf.py file and now hardcode the number of workers according to my App Engine instance class (per GAE docs [here](https://cloud.google.com/appengine/docs/standard/python3/runtime)). — Rishi Prasad, May 19 '20 at 19:18
Also, to be honest, I'm not sure the downsides of preload_app. Somewhere in the GAE documentation it mentioned to set it to true if you have slow bootup times, but it didn't mention any downsides (neither does gunicorn's docs) so I'm not sure why you would / wouldn't set preload_app to True. — Rishi Prasad, May 19 '20 at 19:21
"preload_app" is on gunicorn and used for hot reloading whenever there is any change in code, used during development. Since you are deploying, there is no point of having the flag and have a monitor on code. --- So if that's the only config, then the `timeout` is set to 30 (default), did you tried increasing it to more than 90? Also is there any timeout setting for each request on GAE side (I haven't worked on app engine). So what I'm interested to know is, GAE's timeout < gunicorn timeout. — Saurabh, May 20 '20 at 15:26
@Saurabh GAE's timeout is 10 minutes I believe. I'll go ahead and remove the preload_app flag, your explanation makes sense so thanks for that. I'm not quite sure what the differences are btwn the instance_class types and whether that affects this (e.g. frontend vs backend, but also within each tier of instance class). — Rishi Prasad, May 22 '20 at 04:52
Well to debug, my recommendation - reduce the worker to 1 and one thread. Set the timeout - the max amount of time it takes to fetch data from api, with these settings run locally and se the behaviour. That's how you'll be able to debug. — Saurabh, May 29 '20 at 05:28

Handling signal: term (gunicorn, eventlet, Flask, Google App Engine)

0 Answers0