I have written a webapp in Django that I host under Apache 2.4 using mod_wsgi.
At first glance, it works fine. However, when the Apache process has been running for a while, the webapp starts responding with 503 errors. You can get through sometimes after F5'ing a lot.
The annoying thing is that I dont have anything related to this in my Apache error.log at all, so I don't know where to start debugging this. My only lead is the error displayed in the browser: 503 Service Unavailable
.
Other websites (PHP) are not affected. Restarting the Apache server reliably resolves the issue immediately.
My Apache config for this webapp:
WSGIDaemonProcess app python-path=/opt/app home=/opt/app
WSGIProcessGroup app
WSGIScriptAlias /app /opt/app/wsgi.py
I have found this but I don't think that's the problem because the sockets are already placed in /var/run/apache2/
in my case (not the "apache log directory"). This directory is readable to others. The sockets themselves have mode 700
, but have the correct owner (www-data
).
Any ideas?
EDIT:
I noticed that when the problem occurs and I restart Apache to fix it, Apache takes quite long to exit. In the error log I found these messages:
[Sun Jun 25 14:14:18.935566 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21761 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:18.935637 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21812 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:20.937578 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21761 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:20.937640 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21812 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:22.939827 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21761 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:22.939893 2017] [core:warn] [pid 374:tid 140587055682752] AH00045: child process 21812 still did not exit, sending a SIGTERM
[Sun Jun 25 14:14:24.942034 2017] [core:error] [pid 374:tid 140587055682752] AH00046: child process 21761 still did not exit, sending a SIGKILL
[Sun Jun 25 14:14:24.942176 2017] [core:error] [pid 374:tid 140587055682752] AH00046: child process 21812 still did not exit, sending a SIGKILL
So by the looks of it, the problem might be caused by (WSGI?) threads that are hung? How do I proceed to debug this?