I am not running a webapp, but rather a Machine Learning model which needs to provide real-time predictions.
Am using Nginx with Gunicorn, both of which are running in a docker container. The setup uses 4 gunicorn workers with 1 thread each (hosting 4 copies of my model) and nginx with 1 worker process.
At the moment, this setup returns 502 errors when my client sends a burst of requests to my server. I want to avoid this, even if it means longer response times for each request.
Things I have tried:
- Increasing
net.core.somaxconn
from 128 to 2048: this alleviates the issue of 502s. However, I cannot changesysctl.conf
in a production environment because my docker container runs in a non-privileged mode (I have no control over this, since it is controlled by another team). - Removing nginx altogether. This does work since I don't receive traffic from the internet and I don't have to serve static content, just ML predictions coming in as an HTTP POST request. But I want to avoid this as it is not recommended by Gunicorn.
Would some of the folks here be able to help out?