I am building a web app using Python 2.7, its bottle micro framework, and apache (via mod_wsgi). This app has some RESTish endpoints, one of which results in a connection error in the browser (Firefox shows "The connection was reset" while Opera shows "Connection closed by remote server"). I have been pulling my hair out trying to debug this, as the service worked recently, and I am not able to get at the errors that appear to be in Python. So, I am hoping that if I walk through some specifics someone will be able to suggest next steps, as I am stuck...
- I have tracked the offending line of code down to a matrix multiplication between two numpy.matrixlib.defmatrix.matrix objects
- This code works just fine locally, and works on the server when calling the functionality via a Python shell. The problem is only exposed when the code is called through mod_wsgi
The problem appears to be memory-related. In debugging, I tested with fake data to remove any dependencies on the underlying database being used. Here is what works and what does not:
Works ----- a = np.asmatrix(np.arange(140*30).reshape((140,30))) b = np.asmatrix(np.arange(30).reshape((30,1))) c = a * b a = np.asmatrix(np.ones(140*30, dtype=np.float16).reshape((140,30))) b = np.asmatrix(np.ones(30, dtype=np.float16).reshape((30,1))) c = a * b Fails ----- a = np.asmatrix(np.ones(140*30, dtype=my_type).reshape((140,30))) b = np.asmatrix(np.ones(30, dtype=my_type).reshape((30,1))) c = a * b where my_type is float32 or float64
When I say "fail", I mean that all I see is the connection error in the browser.
There are no errors in the apache log file. Note that the default type for the data in np.arange() is int32, and that works but float32 does not.
As for debugging, I have tried following the advice in the excellent docs for mod_wsgi, namely Debugging and Application Issues. Specifically,
I have set LogLevel to debug and in my Python application's wsgi file set
sys.stdout=sys.stderr
and in the application conf file I set
WSGIRestrictStdout Off WSGIRestrictStdin Off
Still, I am not seeing any Python-related errors in the log file. To be clear, I see errors in the log if I have a syntax error in my Python code, so I know Python-related errors are making it into the log file. But, I am not seeing any errors for this particular behavior.
In the Debugging docs there is a section on Python Interactive Debugger. The Debugger class code works as described when I wrap my application with it and call it from a Python shell. But, when going through mod_wsgi I have not been able to get at the pdb prompt to step through the code.
One big difference between this code working recently and not working is moving servers. We moved from one Linode-hosted system owned by my colleague to an identical system owned by me. The exception is that his Python installation was installed ad hoc where as I am using the AnacondaPro distribution, as it provides some nice extras for doing numerical work, namely, numpy and scipy linked with Intel's MKL libraries for parallelism. I have tried to make sure that the parallelized numerics are not the issue by setting
WSGIApplicationGroup %{GLOBAL}
in application's conf file (see the WSGIApplicationGroup section here) as well as setting
export MKL_SERIAL=yes
in ~/.bashrc to force the numerics to be single-threaded.
None of this has made a difference or yielded any error messages I can act on. Again, the code works as expected from a Python shell, but going through mod_wsgi results in some buried error that I have not figured out how to surface. So, I am desperate for any guidance on how to interactively debug what is going on in the Python layer, or any ideas behind the odd matrix-multiply-and-data-types behavior.
EDIT 1: I just tested one more setup variant that works perfectly fine: I use bottle's WSGIRefServer to run as localhost on the server. I then set up an SSH tunnel so that I could use my laptop's browser to test the API with and all the endpoints work as expected. So, one more piece of evidence that this is mod_wsgi related issue. I followed up with John Siu's comment and set the per thread stack-size to be smaller than the default 8MB:
WSGIDaemonProcess my_app processes=4 threads=16 stack-size=524288
It was good to find old threads on the stack issue, but unfortunately the change did not resolve the problem.
EDIT 2:Regarding @John Siu's answer... The only big difference with our configuration is with apache. Here is what I have:
# dpkg -l | grep apache
ii apache2 2.2.22-1ubuntu1.2 Apache HTTP Server metapackage
ii apache2-mpm-worker 2.2.22-1ubuntu1.2 Apache HTTP Server - high speed threaded model
ii apache2-utils 2.2.22-1ubuntu1.2 utility programs for webservers
ii apache2.2-bin 2.2.22-1ubuntu1.2 Apache HTTP Server common binary files
ii apache2.2-common 2.2.22-1ubuntu1.2 Apache HTTP Server common files
ii libapache2-mod-wsgi 3.3-4build1 Python WSGI adapter module for Apache
EDIT 3 - LESSONS LEARNED: Much thanks to @John Siu for providing suggestions and helping me debug this. We may have discovered, or at least brought some light to, a tricky issue that I have to imagine others will encounter as they use Python to develop analytic web apps. That the issue took as long as it did to debug is certainly a function of me being fairly green with apache configuration, and fairly rusty in working in Linux. Here are some things I learned...
- I thought I was capturing all of the relevant messages in my error.log and access.log files. As soon as I looked in /var/log/apache2/error.log, as @John Siu did, I saw the same MKL error message that had been there for many days. I had no idea this log file existed. Now I know :)
- I suspected an MKL issue from the start. I thought by setting MKL_SERIAL=yes I would be turning off any issue related to a multi-threaded server dealing with multi-threaded BLAS. Obviously this was still not sufficient and using the prefork version of apache was required.
The actual command I needed to remove
worker
and instead useprefork
wasapt-get install apache2-mpm-prefork
.I also came across this command as a handy way to seeing what option you are using
(and thanks to @JohnSiu for the example of using dpkg):apache2 -V | grep 'MPM'
, which shows output likeServer MPM: Prefork -D APACHE_MPM_DIR="server/mpm/prefork"
Sometimes a bounty is required.
I am amazed at the labor of love that is mod_wsgi. That being said, for my needs I am starting to think gunicorn might be a better fit.