Why would Django fcgi just die? How can I find out?

Question

I'm running Django on Linux using fcgi and Lighttpd. Every now and again (about once a day) the server just dies. I'm using the latest stable release of Django, Python and Lighttpd.

The only thing I can think of is that my program is opening a lot of files and executing a lot of external processes, but I'm fairly sure that side of things is watertight.

Looking at the error and access logs, there's nothing exceptional happening (i.e. load isn't above normal). On those occasions where I have had exceptions from Python, these have shown up in the error.log, but when this crash happens I get nothing.

Is there any way of finding out why the process died? Short of putting logging statements on every single line? Obviously I can't reproduce this so I don't know exactly where to look.

Edit

It's the django process that's dying. I'm running the server with manage.py runfcgi daemonize=true method=threaded host=127.0.0.1 port=12345

Have you looked for core files? Have you set your rlimits to permit core files? — jemfinch, Apr 08 '10 at 13:40
Can you just run the server from the command line, in a non-daemonizing debug mode? — Mike DeSimone, Apr 08 '10 at 13:41
Reading the question again, one thing is not entirely clear: is it the lighttpd daemon dying, or your own FastCGI process? — Thomas, Apr 08 '10 at 13:45
@Mike - I could, but the problem is that the site is dying in production, and I want the production server to be running as a daemon (don't I?). The test site runs fine. I will try stress-testing a site with non-daemonizing mode and see what happens. — Joe, Apr 08 '10 at 13:51

score 2 · Accepted Answer · answered Apr 09 '10 at 11:52

2

You could edit manage.py to redirect stderr to a file, assuming runfcgi doesn't do that itself:

import sys
if sys.argv[1] == "runfcgi":
    sys.stderr = open("/path/to/my/django-error.log", "a")

answered Apr 09 '10 at 11:52

Mike DeSimone

41,631
10
72
96

Thanks for the suggestion. I think that as I was getting various exceptions in lighttpd's error.log (for unrelated reasons), stderr is already logged. Suffice it to say, the log is empty when the process dies. – Joe Apr 09 '10 at 15:04

score 0 · Answer 2 · answered Apr 16 '10 at 14:37

0

Is this on your server? (do you own the box?). I've had that problem on shared hosting, and the host was just killing long processes. Do you know if your fcgi is receiving a SIGTERM?

answered Apr 16 '10 at 14:37

Jared Forsyth

12,808
7
45
54

Do you know what process would be sending those messages? It's my [virtual] box. I have a couple of Django processes. This is the only one dying. – Joe Apr 16 '10 at 16:13

score 0 · Answer 3 · answered Apr 18 '10 at 21:57

0

Have had the same problems. Not only do they die without warning or reason they leak like crazy too with threads being stuck without a master process. We solved this problem by having a cronjob run every 5 minutes that checks if the port number is up and running and if not restart.

By the way, we've now (slowly migrating) given up on fcgi and moved over to uwsgi.

answered Apr 18 '10 at 21:57

Peter Bengtsson

7,235
8
44
53

I came to the same conclusion but with a 1 minute interval. Has uwsgi solved your problem? – Joe Apr 19 '10 at 21:08
Don't know yet. Haven't seen it crash yet at least. – Peter Bengtsson Apr 21 '10 at 15:06

Why would Django fcgi just die? How can I find out?

3 Answers3