0

I am not running a webapp, but rather a Machine Learning model which needs to provide real-time predictions.

Am using Nginx with Gunicorn, both of which are running in a docker container. The setup uses 4 gunicorn workers with 1 thread each (hosting 4 copies of my model) and nginx with 1 worker process.

At the moment, this setup returns 502 errors when my client sends a burst of requests to my server. I want to avoid this, even if it means longer response times for each request.

Things I have tried:

  • Increasing net.core.somaxconn from 128 to 2048: this alleviates the issue of 502s. However, I cannot change sysctl.conf in a production environment because my docker container runs in a non-privileged mode (I have no control over this, since it is controlled by another team).
  • Removing nginx altogether. This does work since I don't receive traffic from the internet and I don't have to serve static content, just ML predictions coming in as an HTTP POST request. But I want to avoid this as it is not recommended by Gunicorn.

Would some of the folks here be able to help out?

  • Multiple load-balanced containers. Or talk to the other team about changing the sysctl. – Michael Hampton Sep 10 '20 at 12:45
  • @MichaelHampton the team that hosts the docker containers is using multiple docker containers on each server, with load balancing across servers. The issue is when the upstream (which makes the requests) suddenly sends a burst of traffic, and the autoscaling which increases the number of servers cannot keep up. Any suggestions in this case? – Abhishek Divekar Sep 10 '20 at 14:21

0 Answers0