1

I recently noticed when I logged in that I had several thousand processes marked "zombie". Upon further investigation, I found the following from ps fax:

  701 ?        Ss     0:28 cron
 3363 ?        S      0:00  \_ CRON
 3364 ?        Ss     0:00      \_ /bin/sh -c   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete
 3371 ?        S      0:00          \_ find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +24 ! -execdir fuser -s {} ; -delete
 3451 ?        S      0:02              \_ fuser -s ./sess_jns5af2mvm81e2fg1rbuctlt54
 3452 ?        Z      0:00                  \_ [fuser] <defunct>
 3453 ?        Z      0:00                  \_ [fuser] <defunct>
 3454 ?        Z      0:00                  \_ [fuser] <defunct>

... many, many lines omitted ...

13642 ?        Z      0:00                  \_ [fuser] <defunct>

As far as I can tell, this is a script in /etc/cron.d/php that is supposed to clean up dead PHP sessions at 10 and 40 minutes past the hour.

Edit: Here's the text of the script. It's installed by default with PHP on Ubuntu.

# /etc/cron.d/php5: crontab fragment for php5
#  This purges session files older than X, where X is defined in seconds
#  as the largest value of session.gc_maxlifetime from all your php.ini
#  files, or 24 minutes if not defined.  See /usr/lib/php5/maxlifetime

# Look for and purge old sessions every 30 minutes
09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete

For some reason (currently I'm guessing a badly behaved web crawler making a new session on each request, but I'm still looking over the logs), sometimes there are many thousands of abandoned php sessions in /var/lib/php/, and when this script runs it will happily spawn a new fuser process for each one. This quickly hits the process limit, and brings things to a crawl.

What can I do, besides just deleting this cron job and cleaning things up things manually?

josePhoenix
  • 183
  • 2
  • 8
  • 1
    Perhaps I am missing something but dont the php config values session.gc_probability, session.gc_divisor and session.gc_maxlifetime get used to provide cleanup services of the typical sessions folder? – Matthew Ife Nov 17 '11 at 22:07

5 Answers5

4

It would probably be best to move the logic from find to a script that loops through all of the files on the commandline to see if they're being accessed, and if not, delete them:

#!/bin/bash

for x; do
  if ! /bin/fuser -s "$x" 2>/dev/null; then
    rm "$x"
  fi
done

Then change the cron job to just

09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -execdir thatscript.sh {} +

This will have find collect all the session files matching the max age, then run thatscript.sh with all of them at once (due to the + instead of ;). The script is then responsible for making sure the file is not in use and deleting it. This way, find should only have one direct child itself, and bash should not have any problem cleaning up the fuser and rm children.

From find's docs, it's not clear whether find will automatically divide up the list of filenames into multiple executions if they exceed shell/OS limits (and 13000 files may do so... older versions of bash had a default command line argument limit of somewhere around 5000) In that case, you may change -execdir thatscript.sh {} + to -print0 | xargs -0 thatscript.sh to have xargs divide up the files.

Alternatively, if you don't have the drive mounted noatime, change -cmin to -amin and ditch the tests entirely:

    09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -amin +$(/usr/lib/php5/maxlifetime) -delete

This will remove all the session files last accessed more than [output of the maxlifetime command] minutes ago. As long as you don't have any php processes that open a session then sit around for a long time (default for that maxlifetime on Debian seems to be 24 minutes which would be a very long time for a page to load) doing nothing, this shouldn't zap any sessions currently in use.

DerfK
  • 19,493
  • 2
  • 38
  • 54
4

I have this problem also on ubuntu 11.10 and I solved this problem by editing:

/etc/cron.d/php5 

and replace the code with:

09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete

This is the ubuntu 11.04 cron job for php.

1

Fix the script so that it waits for its children or ignores SIG_CHILD. Can you put the script somewhere we can see it?

Update: It looks like you're triggering a bug in find!

David Schwartz
  • 31,449
  • 2
  • 55
  • 84
  • Oops, forgot that part. I edited the question to include the script. – josePhoenix Oct 29 '11 at 23:36
  • Good heavens. Who should I tell about that? – josePhoenix Oct 29 '11 at 23:47
  • Good question! It looks like `find` needs to either ignore or handle SIG_CHILD. Does your platform have [GNU find](http://www.gnu.org/software/findutils/)? What does `find --version` output? – David Schwartz Oct 29 '11 at 23:50
  • Looks like I'm running `find (GNU findutils) 4.4.2`, which has its bugs at http://savannah.gnu.org/bugs/?group=findutils . I'll try posting it there, though I'm not sure I'm a savvy enough `find` user to explain what is going on. – josePhoenix Oct 29 '11 at 23:53
  • What's happening is that a find condition is requiring a large number of child processes to be started by find, and find is not reaping their zombies. – David Schwartz Oct 29 '11 at 23:55
  • Ah, another inaccuracy in my question. `find` does _eventually_ reap them, but I think it's spawning all the processes before it starts reading their exit codes. Would that be an accurate assessment? – josePhoenix Oct 29 '11 at 23:59
  • That could be. I can't tell from just what you've posted. But that's a more likely failure mode that would cause the same symptoms. – David Schwartz Oct 29 '11 at 23:59
1

I solved this for a client by moving the sessions from the file system to memcache. They didn't have the zombie processes, but still had zillions of sessions that the cronjob couldn't keep up on deleting. It took like 10 minutes to install memcache, reconfigure php.ini, test it out, and add some munin graphs to watch the memcache size. Presto - server load decreased, everyone happy.

http://www.dotdeb.org/2008/08/25/storing-your-php-sessions-using-memcached/ http://www.ducea.com/2009/06/02/php-sessions-in-memcached/

Wim Kerkhoff
  • 901
  • 1
  • 5
  • 12
1

This will be most helpful as well:

https://bugs.launchpad.net/ubuntu/+source/php5/+bug/876387

Read comment #4 and #8, with the latter even fixing fuser itself!

andy
  • 11
  • 1