FastAPI Gunicorn Uvicorn for Production Deployment with Google Cloud Run (Stress Testing)

Question

here I want to ask to you, what's the difference between running the gunicorn uvicorn with python, and default from tiangolo?

I have tried to stress testing these using JMeter with thread properties:

Screenshot from 2021-02-18 12-29-26

From these, I got the result::

Screenshot from 2021-02-18 12-20-05

From above I have tried:

Dockerfile with tiangolo base
Dockerfile with python:3.8-slim-buster and run it with gunicorn command
Dockerfile with python:3.8-slim-buster and run it with python

This is my Dockerfile for case 1 (Tiangolo base):

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
RUN apt-get update && apt-get install wget gcc -y
RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN python -m pip install --upgrade pip
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY . /app

This is my Dockerfile for case 2 (Python base with gunicorn command):

FROM python:3.8-slim-buster as builder
RUN apt-get update --fix-missing
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1-mesa-dev python3-pip git
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip3 install -U setuptools
RUN pip3 install --upgrade pip
RUN pip3 install -r ./requirements.txt
COPY . /usr/src/app
ENTRYPOINT gunicorn --bind :8080 --workers 1 --threads 8 main:app --worker-class uvicorn.workers.UvicornH11Worker --preload --timeout 60 --worker-tmp-dir /dev/shm

This is my Dockerfile for case 3 (Python base with python command):

FROM python:3.8-slim-buster
RUN apt-get update --fix-missing
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1-mesa-dev python3-pip git
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip3 install -U setuptools
RUN pip3 install --upgrade pip
RUN pip3 install -r ./requirements.txt --use-feature=2020-resolver
COPY . /usr/src/app
CMD ["python3", "/usr/src/app/main.py"]

Here I am confused, from the results above it looks like they have fairly the same results, what is the difference between the three methods above? which one is the best for production? I'm sorry, I'm new here in the production deployment API. I need some advice on this case. Thank you

This is my Cloud Run command

gcloud builds submit --tag gcr.io/gaguna3/priceengine

gcloud run deploy backend-pure-python \
    --image="gcr.io/gaguna3/priceengine" \
    --region asia-southeast2 \
    --allow-unauthenticated \
    --platform managed \
    --memory 4Gi \
    --cpu 2 \
    --timeout 900 \
    --project=gaguna3

If you run these tests on a local environment, did you receive the same performance difference? — Jan Hernandez, Feb 26 '21 at 23:22
@JanHernandez when I try it on my local, it still got the same result — MADFROST, Feb 27 '21 at 03:03
Please test with running a single Uvicorn worker directly, instead of running it through gunicorn. I feel this is best for how Cloud Run works. — Zaffer, Dec 26 '21 at 18:35
thanks for the benchmarks and effort. i want to deploy fastapi to serve ml models in k8s. some questions: (1) for case 1, what is the launch configuration, how many threads, workers and which worker class was used? or were the same configuration used in all 3 cases? (2) after this testing, do you think a machine with more cpu threads and memory will provide better performance and if yes, what changes may be required in launching gunicorn? — Naveen Reddy Marthala, Jan 15 '22 at 07:26
and tiangolo discourages using gunicorn in k8s and recommends using containers with single uvicorn process. source: https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker#-warning-you-probably-dont-need-this-docker-image. having seen your benchmarks, i am not sure if gnicorn should be used or not in k8s. — Naveen Reddy Marthala, Jan 15 '22 at 07:27
having seen OP's benchmarks, I have adopted this, but i can't even serve 1000 requests every second. please help me, more info at: https://stackoverflow.com/questions/70912912/gunicorn-doesnt-use-all-cpu-resulting-in-lot-of-failed-requests — Naveen Reddy Marthala, Jan 30 '22 at 12:08

score 8 · Answer 1 · answered Mar 01 '21 at 17:26

Based on your examples, I noticed that the first container is using a fine tuned version of gunicorn, also in the tiangolo's github page is mentioned

This image has an "auto-tuning" mechanism included, so that you can just add your code and get that same high performance automatically.

From my perspective, this can be achieved by performing a dynamic scaling on the gunicorn workers and/or using Cpython modules.

The difference between the second and third container is the number of workers that you defined for your service, in the second container you have only 1 worker with 8 threads, if you play with your config you can improve the performance, as is mentioned on this article.

It is not a bad idea to use tiangolo/uvicorn-gunicorn but I recommend you to lock the version of the container, this is in order to prevent that a future change affects your productive environment.

On the other hand, using vanilla python images allow you to customize the image without fear of breaking something, but this requires some time to get the same performance of the tiangolo docker.

in the github repo, tiangolo is using gunicorn from pip install and he also has an "gunicorn_conf.py" file to configure it. is the extra performance because of the conf file he is using? if not, what makes tiangolo's gunicorn fine tuned and how can i use that version in my docker images? — Naveen Reddy Marthala, Jan 16 '22 at 13:45

score 8 · Answer 2 · answered May 08 '21 at 01:14

8

It was suggested that a 3x speed up over uvicorn could be obtained by adding gunicorn e.g.:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app

for more details and benchmarks see: https://stackoverflow.com/a/63427961/2705777

answered May 08 '21 at 01:14

Neil

7,482
6
50
56

FastAPI Gunicorn Uvicorn for Production Deployment with Google Cloud Run (Stress Testing)

2 Answers2