This question was taken down on SO with a suggestion to ask here, so here goes: Looking for some guidance from people with experience in this area. I run a Docker container (approx. 670mb in size) in Google Cloud Run, inside is my Python server based on Flask and it is currently ran by this command in the Dockerfile:
CMD exec gunicorn --bind 0.0.0.0:8080 --workers=1 --threads 8 --timeout 0 "db_app.app:create_app()"
However, after reading about heartbeat checks and other things, I wonder, what is the optimal Gunicorn setup for GCR?
Currently, this server's version in production operates on a physical machine and serves about 30-40 requests per 30 minutes. It is expected to grow to about 100-150, maybe more.
How many workers, threads, should I specify in my exec command to use the GCR's capabilities most effectively? Talking about something small (in terms of GCR setup) like 1 CPU, 1gb RAM, maybe more in the future, that's just testing phase, but I hope to keep the costs minimal. So how should I set my Gunicorn there? Maybe I should also use --preload? specify worker-connections?
Official example has 1 worker and 8 threads, but simultaneously says to increase the number of workers according to the number of cores and how many cores do they have there? I hope for some people who have such/similar servers on GCR to share some knowledge on their setups..