Gunicorn (with Flask) parameters for Google Cloud Run - production setup?

Question

This question was taken down on SO with a suggestion to ask here, so here goes: Looking for some guidance from people with experience in this area. I run a Docker container (approx. 670mb in size) in Google Cloud Run, inside is my Python server based on Flask and it is currently ran by this command in the Dockerfile:

CMD exec gunicorn --bind 0.0.0.0:8080 --workers=1 --threads 8 --timeout 0 "db_app.app:create_app()"

However, after reading about heartbeat checks and other things, I wonder, what is the optimal Gunicorn setup for GCR?

Currently, this server's version in production operates on a physical machine and serves about 30-40 requests per 30 minutes. It is expected to grow to about 100-150, maybe more.

How many workers, threads, should I specify in my exec command to use the GCR's capabilities most effectively? Talking about something small (in terms of GCR setup) like 1 CPU, 1gb RAM, maybe more in the future, that's just testing phase, but I hope to keep the costs minimal. So how should I set my Gunicorn there? Maybe I should also use --preload? specify worker-connections?

Official example has 1 worker and 8 threads, but simultaneously says to increase the number of workers according to the number of cores and how many cores do they have there? I hope for some people who have such/similar servers on GCR to share some knowledge on their setups..

looks like you were able to get some help for this on SO - just to add to this, keep in mind you can also specify the number of vCPUs assigned to Cloud Run: https://cloud.google.com/run/docs/configuring/cpu https://stackoverflow.com/questions/64897152/gunicorn-with-flask-parameters-for-google-cloud-run-gcr-what-to-put-in-doc — Oliver Aragon, Nov 19 '20 at 17:59

score 1 · Answer 1 · answered Nov 22 '20 at 19:01

1

The point here isn't only about gunicorn, but what your API does, how much time it takes to answer, and how much memory it requires to execute.

Supposing it's a simple API answering in 2-digit milliseconds and doesn't require any heavy library, I think 1 worker + 4 threads is supposed to work smoothly for your traffic (100-150 requests per 30 minutes).

Anyway, considering your application is thread-safe, I'd use 4 workers for having extra power in a way it can deal with some unexpected higher traffic.

answered Nov 22 '20 at 19:01

surfingonthenet

715
3
7

Thanks for the tip. In fact it does use a heavy library and answers can take up to 1-2 seconds in real time. So far I left it with 3 workers and 8 threads.. – Amy Wong Nov 23 '20 at 17:55
If it's possible in your scenario, consider creating Pub/Sub + Cloud Function to execute the have task. I mean, the user hits the endpoint which executes the main flow and publishes to a Pub/Sub topic that will trigger a Cloud Function to execute the heavy task. This way you can make your API answering faster – surfingonthenet Nov 24 '20 at 21:47

Gunicorn (with Flask) parameters for Google Cloud Run - production setup?

1 Answers1