We are having an issue with a VPS running plesk 9.5 on ubuntu 8.04 At seemingly random intervals Apache will disappear and needs to be started manually. I have checked the apache error log, /var/log/messages, individual virtual host apache error files and cannot find anything that coincides with the time of the failure. dmesg is empty which is a bit odd.
We have also had the psa service go down for no apparent reason but apache stay up.
I'm at a loss to diagnose this really because all the log files I can find do not point to any issues. Are there any others I can look at?
Memory usage sits at about 55% (out of 400mb) and it isn't a particularly high trafficed server.
Any pointers as to where else I can find out what is going on would be very much appreciated.
Nick
Update:
I have been running watchdog for a while now and that is restarting processes when they go down. Unfortunately it is quite often more than apache that goes down (although sometimes it is just apache) There seems to be no pattern to it. We also get courier and qmail going down. Anyway, I have upped the logging level for apache and noticed the following
[Mon Mar 07 16:46:14 2011] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 21 total children
[Mon Mar 07 16:49:56 2011] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 12 total children
[Mon Mar 07 16:50:08 2011] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 28 total children
[Mon Mar 07 16:50:09 2011] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 0 idle, and 36 total children
[Mon Mar 07 16:50:14 2011] [info] [client ipaddressofserver] (32)Broken pipe: core_output_filter: writing data to the network
[Mon Mar 07 16:50:14 2011] [info] removed PID file /var/run/apache2.pid (pid=9556)
[Mon Mar 07 16:50:14 2011] [notice] caught SIGWINCH, shutting down gracefully
[Mon Mar 07 16:50:18 2011] [emerg] (22)Invalid argument: mod_fcgid: can't get lock, pid: 9557
[Mon Mar 07 16:50:24 2011] [info] Init: Seeding PRNG with 0 bytes of entropy
I have already been increasing the minmaxspareservers but slowly and keeping an eye on memory usage. Surely that can't be causing apache, courier and qmail to fail?
Any help on the log entries and what they indicate would be appreciated
Cheers Nick