1

I have a linux server with 16gb of RAM and 8 cores. It never goes into swap and CPU usage never exceeds ~1.5. I believe it is safe to say there is plenty of capacity.

Occasionally I get some [warn] mod_fcgid: process 28341 graceful kill fail, sending SIGKILL.

On Apache/2.2.15 (CentOS 6.3) mod_fcgid/2.3.7, all mod_fcgid settings below are not present, thus default:

FcgidMinProcessesPerClass
FcgidMaxProcessesPerClass
FcgidMaxProcesses
FcgidIdleTimeout
FcgidProcessLifeTime
FcgidIdleScanInterval
FcgidOutputBufferSize

I want to identify in which vhost are processes have been getting the SIGKILLed. So I loaded mod_status and turned ExtendedStatus ON. I set log_server_status to run every minute, since I can't afford to manually reload the /server-status/ page and at the same time keep an eye on the logs, all day, waiting for a SIGKILL to happen.

But the output of log_server_status is not very helpful. This is all I see in the logs created by the script:

180131::::
all the way to
235501::::
235601::::
235701::::
235801::::
235901::::

I want to track down the vhosts responsible for SIGKILL. How do I go about it? Am I doing something wrong with regards to log_server_status? The output seems useless...

Gaia
  • 1,855
  • 5
  • 34
  • 60
  • 1
    First way is to identify the process' pid. You can do this two ways, either have you cgi code log it's process id and name. OR.. You can have a cron each 1-5 minutes capturing the cgi processes with names and pids. You can even tail the error log in real time with a script to email you when the process got SIGKILL'd. Then you don't need to have the cron constantly running, using up diskspace, if it's a issue. – Danie Feb 06 '13 at 10:51
  • I have a script tailing the error log and I get alerted when SIGKILL happens. I don't have cron each 1-5 minutes capturing the cgi processes with names and pids, but even if I had names and PID's I would not be able to pinpoint the resposible vhost. And thats exactly what I am hoping to do with log_server_status... Thanks @Danie – Gaia Feb 06 '13 at 11:05
  • So the only way to get the responsible vhost is to have the cgi process log, from which vhost the request came from. Essentially, you would need to build a log file per request to the cgi containing, date, vhost, cgi name, cgi process id, and even the parameters if you wish. Oh btw, ExtendedStatus is just used for getting stats from apache. – Danie Feb 06 '13 at 12:15
  • Maybe log_server_status is meant for getting stats only, but ExtendedStatus matches vhost to process when viewed in non-auto (?auto) mode. – Gaia Feb 06 '13 at 13:48

2 Answers2

2

You seem to be running PHP through mod_fcgid. As long as the same wrapper is used for starting the PHP interpreter for all vhosts, the processes spawned by mod_fcgid are cross-used as you appear to have no vhost-specific directives for fcgid. They remain running after startup and get re-used to run whatever PHP code is passed to them for processing (which is the very salt of mod_fcgid BTW). Refer to the mod_fcgid documentation for details.

There is a documented bug breaking this behavior and leading to a situation where PHP processes might be spawned for each vhost disregarding any defined per-class limits under certain conditions, but this bug only applies to the old 2.3.6 version of the module, is undesired behavior and has been fixed in 2.3.7.

Other than that, the log warnings you are seeing are not due to resource exhaustion, this is normal mod_fcgid activity. mod_fcgid terminates the running processes periodically (either after an idle timeout, a certain lifetime or after a certain number of requests). The termination happens by sending a SIGTERM to the process. If the process is not able to handle the SIGTERM in time for some reason (it might be too busy, but might also be just catching and ignoring SIGTERM requests), it is ended forcibly via a SIGKILL - this is what the warning is about.

If you are unhappy with the timing of the process terminations, just adjust the respective parameters with the FcgidIdleTimeout, FcgidProcessLifetime and FcgidMaxRequestsPerProcess directives.

the-wabbit
  • 40,737
  • 13
  • 111
  • 174
  • I am using the defaults for all three settings. Shouldn't 300 seconds be enough for FcgidIdleTimeout? If something stays busy for longer than that there is something wrong with my application, and I would like to find out. If the process is ignoring SIGTERM requests I would like to find out which apps are doing so. – Gaia Feb 06 '13 at 14:30
  • Update done, thanks. I understood that if the process has been idle but doesn't respond to a SIGTERM it will be killed. Since it may not respond to a SIGTERM because the process believes it is busy, when in fact it is stuck, it gets SIGKILLed. Incorrect understanding? – Gaia Feb 06 '13 at 21:23
  • Understood. I would like to find out vhost they belong to and why it they are being killed, hence the OP. Thanks – Gaia Feb 06 '13 at 22:26
  • I did post the config as an update to the question yesterday. I would like to know how to find out why they are being killed, and I still don't understand that a process is not linked to a vhost when I get server-status update such as: `Process: php5.fcgi (/home/ae/fcgi-bin/php5.fcgi) Pid Active Idle Accesses State 32174 1879 199 23 Ready 30028 2808 108 60 Ready Process: php5.fcgi (/home/ae/fcgi-bin/php5.fcgi) Pid Active Idle Accesses State 30056 2753 342 7 Ready Process: php5.fcgi (/home/b/fcgi-bin/php5.fcgi) Pid Active Idle Accesses State 29935 2873 173 75 Ready 29932 2874 108 81 Ready` – Gaia Feb 07 '13 at 07:46
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/7421/discussion-between-syneticon-dj-and-gaia) – the-wabbit Feb 07 '13 at 08:34
2

I had to manually comb the apache error logs, daily, for entries that occurred at the same time the SIGKILL messages were being logged in syslog. This allowed mne to find which vhosts 'processes were getting SIGKILLED. I started to monitor (manually) which files were being accessed at said timestamps on those vhosts and after a few days I had enough data to track down which php files were generating errors.

The problem is solved and I am not getting any more SIGKILL warnings.

As a side note, which applies only to my specific case: the warnings came from magento cron entries that were not able to finish within the maximum allowed time for script execution. So I increased the execution time to 180 (for a couple days) and those cron jobs started to finish sucessfully. I then reduced the max allowed time and they can now finish under 60 seconds. The long execution time was because a few jobs had not run in a long time and they had a larger than usual load to deal with.

Gaia
  • 1,855
  • 5
  • 34
  • 60