I am a data scientist / machine learning developer. Sometimes, I have to expose my models by providing an endpoint. I usually do this via Flask and gunicorn:
exampleproject.py
:
import random
from flask import Flask
app = Flask(__name__)
random.seed(0)
@app.route("/")
def hello():
x = random.randint(1, 100)
y = random.randint(1, 100)
return str(x * y)
if __name__ == "__main__":
app.run(host='0.0.0.0')
wsgi.py
:
from exampleproject import app
if __name__ == "__main__":
app.run()
Run by
$ gunicorn --bind 0.0.0.0:5000 wsgi:app
When I benchmark this simple script, I get:
$ ab -s 30 -c 200 -n 25000 -v 1 http://localhost:5000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 10000 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 20000 requests
Completed 22500 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 24941 requests completed
With less total requests, it looks fine:
$ ab -l -s 30 -c 200 -n 200 -v 1 http://localhost:5000/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: gunicorn/19.9.0
Server Hostname: localhost
Server Port: 5000
Document Path: /
Document Length: Variable
Concurrency Level: 200
Time taken for tests: 0.084 seconds
Complete requests: 200
Failed requests: 0
Total transferred: 32513 bytes
HTML transferred: 713 bytes
Requests per second: 2380.19 [#/sec] (mean)
Time per request: 84.027 [ms] (mean)
Time per request: 0.420 [ms] (mean, across all concurrent requests)
Transfer rate: 377.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 2 1.2 2 3
Processing: 1 36 16.8 41 52
Waiting: 1 36 16.8 41 52
Total: 4 37 15.8 43 54
Percentage of the requests served within a certain time (ms)
50% 43
66% 51
75% 51
80% 52
90% 52
95% 52
98% 53
99% 53
100% 54 (longest request)
Is there something I can change to improve the configuration for my kind of workload?
When I execute only one call of my real model, I see an answer in 0.5s. I would say an execution time of up to 1.0s is reasonable. Every call is stateless, meaning each call should be independent of other calls.
When I tried to analyze this problem, I saw a lot of TIME_WAIT
:
$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
2 CLOSE_WAIT
4 LISTEN
10 SYN_SENT
60 SYN_RECV
359 ESTABLISHED
13916 TIME_WAIT
How can I confirm / falsify that this is the problem? Is this in any way related to Flask / gunicorn? How does nginx relate to gunicorn?