I have a Nginx+Gunicorn+Flask application which provides some APIs using a machine learning model. The model is stored in a large file and takes long time (about one minute) to be loaded into memory. Now I want to update the model file and restart the worker without downtime. How can I configure gunicorn (or something) to wait until new processes load?
Just sending kill -HUP
seems to be insufficient for the purpose. Requests which come before the new process loads seems to be dropped.