How to process several HTTP requests with Flask

Question

I have a question regarding Flask, Waitress and parallel processing of HTTP requests.

I have read that Flask alone can only process one HTTP request at a time.

In the table below I put all the possible configuration and I would like to have your feedback concerning the number of HTTP requests that I can process in parallel.

|                        |Only Flask| Flask and Waitress|
|-------------------  -- |----------|-------------------|
|1 CPU & 1 core          | 1 request| 1 request         |
|1 CPU & 4 core          | 1 request| 4 request         |
|2 CPU & 1 core each CPU | 1 request| 2 request         |
|2 CPU & 4 core each CPU | 1request | 8 requests        |

I ask these questions because a colleague told me that we can process several thousand HTTP requests with an Apach server with only 1 CPU and 1 core !!

So, how should I handle the maximum number of HTTP requests in parallel?

The link you provided says that the "built-in" server that comes with Flask cannot handle the requests, not Flask itself. It gives a list of HTTP servers to use instead of it. Apache is trusted but there are several others that could be used. — didiz, Feb 22 '20 at 08:12
You are right and that is why I used waitress. Could you confirm the values that I put in the table regarding the number of requests? How can we process thousands of requests in parallel with only one CPU and one core — ZchGarinch, Feb 22 '20 at 09:55
Short answer no. The table you posted says only one and I would tend to believe it. A thousand requests concurrent is very high traffic. — didiz, Feb 22 '20 at 10:23
So, vow can ngix or apach handle such traffic with only oe core ? Why it's recommended to use ngix to serve Flask app ? — ZchGarinch, Feb 22 '20 at 10:46
Nginx can handle the requests easily but it is not an application server. Is your application stateless and how long does it take to finish one request? The only way would be for you to actually measure it. Here is one such experiment with 2400 requests per worker https://www.freecodecamp.org/news/million-requests-per-second-with-python-95c137af319/ — didiz, Feb 22 '20 at 10:58
What do you mean per worker please ? Is the worker the thread ? If I have one core then this means that I have only one thread ? My application is stateless, it simply sends custom response depending on the request parameters. Thanks — ZchGarinch, Feb 22 '20 at 13:15
Not sure what u mean but here is another answer how nginx handles requests without threads: https://stackoverflow.com/questions/3436808/how-does-nginx-handle-http-requests — didiz, Feb 22 '20 at 14:30

Saiprasad Balasubramanian · Accepted Answer · 2020-02-22T16:58:24.553

Let me clear out the confusion for you.

When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV to production and run, you'll find a warning in the terminal.

Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff

To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.

In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers. You can achieve it in the following ways:

Worker Class - The type of worker to use
No of Workers
No of Threads

The way you calculate the maximum number of concurrent requests is as follows: Taking a 4 core server as

As per the documentation of gunicorn, the optimal number of workers is suggested as (2 * num_of_cores) + 1 which in this case becomes (2*4)+1 = 9

Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores) which in this case comes out to say 4*9 = 36

So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections

Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.

Now coming to Web Servers

Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.

So in a gist, you could use a combination from the list below as per your requirements,

Flask + Application Server + Web Server where, Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc

How to process several HTTP requests with Flask

1 Answers1