3

I'm unable to find why Apache hang after a few day of uptime.

Here is the output of a top with sorting on memory

top - 14:51:45 up 1 day, 18:02,  3 users,  load average: 6.73, 5.15, 6.27
Tasks: 233 total,   1 running, 226 sleeping,   0 stopped,   6 zombie`
Cpu(s): 34.0%us, 13.8%sy,  0.0%ni,  3.2%id, 48.3%wa,  0.0%hi,  0.8%si,  0.0%st
Mem:   4043688k total,  3943568k used,   100120k free,    46784k buffers
Swap:  1051376k total,   659504k used,   391872k free,   372016k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17156 apache    40   0  271m 204m 5800 S  0.0  5.2   0:10.53 httpd
16735 apache    40   0  273m 204m 5504 S  0.0  5.2   0:05.32 httpd
17532 apache    40   0  271m 204m 5188 S 81.1  5.2   0:07.83 httpd
17904 apache    40   0  271m 204m 5396 S  0.0  5.2   0:09.81 httpd
17177 apache    40   0  271m 203m 5248 S  0.0  5.2   0:05.63 httpd
19507 apache    40   0  271m 203m 5272 S  0.0  5.2   0:05.14 httpd
16734 apache    40   0  271m 203m 5380 S  0.0  5.2   0:10.20 httpd
18571 apache    40   0  271m 203m 5240 S  0.0  5.2   0:05.05 httpd
19492 apache    40   0  271m 203m 5212 S  0.0  5.2   0:05.30 httpd
19506 apache    40   0  271m 203m 5188 S  0.0  5.2   0:10.28 httpd
19497 apache    40   0  271m 203m 5172 S  0.0  5.2   0:07.65 httpd
17527 apache    40   0  271m 203m 5240 S  0.0  5.2   0:05.03 httpd
19144 apache    40   0  271m 203m 5220 S  0.0  5.2   0:02.58 httpd
19145 apache    40   0  271m 203m 5152 S  0.0  5.2   0:02.60 httpd
17165 apache    40   0  271m 203m 5104 S  0.0  5.1   0:02.63 httpd
17900 apache    40   0  271m 203m 4576 S  0.0  5.1   0:05.08 httpd
17174 apache    40   0  271m 193m 5300 S  0.0  4.9   0:10.04 httpd
16742 apache    40   0  271m  84m 5468 S  0.0  2.1   0:20.03 httpd
19812 apache    40   0  100m  33m 4812 D  7.6  0.8   0:00.23 httpd
16741 apache    40   0  271m  18m 5700 S  0.0  0.5   0:08.16 httpd
 5095 root      40   0 84448  13m 4388 S  0.0  0.3   0:14.79 httpd
 4511 named     40   0 51340  10m 1128 S  0.0  0.3   2:13.22 named
 4697 mysql     40   0  153m 8964 2560 S  0.0  0.2   4:50.60 mysqld
16727 apache    40   0 73828 7752  444 S  0.0  0.2   0:00.00 httpd
 4245 sso       40   0 28268 3224 1628 S  0.0  0.1   0:00.10 sw-engine-cgi
25520 root      40   0 68164 3052  276 D  0.0  0.1   1:58.79 tar
25473 psaadm    40   0 38364 2564  852 S  0.0  0.1   0:01.69 sw-engine
25512 root      40   0 14112 2432  808 S  0.0  0.1   0:00.78 python
 4912 root      40   0 35160 1648  460 S  0.0  0.0   0:11.67 spamd
28823 root      40   0 12092 1556 1320 S  0.0  0.0   0:00.08 sshd
13713 root      40   0 12092 1444 1324 S  0.0  0.0   0:00.36 sshd
 6829 root      40   0 12092 1440 1320 S  0.0  0.0   0:01.65 sshd
 4240 sso       40   0 27204 1140  760 S  0.0  0.0   0:00.21 sw-engine-cgi
20409 qmailr    40   0  4908 1060  884 S  0.0  0.0   0:00.00 qmail-remote.mo
 7073 root      40   0  5112 1032  816 S  0.0  0.0   0:00.01 bash
20135 qmaild    40   0  4920 1032  864 S  0.0  0.0   0:00.00 qmail-smtpd
19755 qmaild    40   0  4920 1028  856 S  0.0  0.0   0:00.00 qmail-smtpd
13757 root      40   0  4992 1016  804 S  0.0  0.0   0:00.00 bash
29109 root      40   0  2416 1016  724 R  0.0  0.0   0:06.99 top
20133 qmaild    40   0  4920 1000  832 S  0.0  0.0   0:00.00 qmail-smtpd

I did a strace -p [pid] and found that those processes are doing normal apache stuff...

Warner
  • 23,756
  • 2
  • 59
  • 69
  • would be helpful to label columns. The column order looks more like top than "ps ux" to me – freiheit Jul 19 '10 at 17:40
  • possible duplicate of [Apache uses 100% CPU. Can "ps" command tell me what it is doing?](http://serverfault.com/questions/161405/apache-uses-100-cpu-can-ps-command-tell-me-what-it-is-doing) – MikeyB Jul 19 '10 at 17:48
  • @freiheit it is top, not ps ux – Fahad Sadah Aug 02 '10 at 20:26

4 Answers4

2

A complete Apache hang is a very rare thing to happen.

What if you read the Apache's access / error logs around those hang times? Does some specific URL get visited every time that happens?

Is your installation a basic Apache + PHP + MySQL installation, or do you have something more exotic installed. such as some PHP op-code cache like xCache installed?

How about httpd.conf? Have you set up some very long timeout values? And do you have KeepAlive on or off?

A command apachetop can also be very helpful during debugging.

EDIT: Sometimes a faulty redirection in either .htaccess or PHP code can cause some very dramatic server crashes. If you have a .htaccess file containing line like

ErrorDocument 404 http://yourserver/notfound.html

and that file does not exists, Apache goes to very rapid redirection loop, gettings things messed up in seconds.

A proper ErrorDocument line should be like

ErrorDocument 404 /some/path/notfound.html

Also the Apache rewrite module is very capable of crashing your server with some faulty rewrite rules. Mod_rewrite is voodoo. Damn cool, but still voodoo, and sometimes an extremely efficient WMD.

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81
  • It's a complete server hang (the server is out of memory). The server is 100% standard and is also running Plesk 9. No long timeout value and KeepAlive is off –  Aug 01 '10 at 13:43
  • OK, I added some ideas to my response. – Janne Pikkarainen Aug 01 '10 at 13:49
  • For the records, I installed apachetop, but since my log files are splitted accord virtual domains, it doesn't help much. –  Aug 01 '10 at 13:57
  • apachetop -f /var/log/apache2/*access*.log or if the log files are in different directories, apachetop -f /var/log/hosts/*/access.log should make your day. – Janne Pikkarainen Aug 01 '10 at 14:10
  • Using /vhosts/*/access.log will take the first directory under /vhosts and ignore all the others –  Aug 09 '10 at 14:20
  • +100 for the apachetop note, that just helped me identify and fix several sites from XML-RPC Attacks. – Ash Jun 09 '16 at 06:46
1

It looks like they're all waiting on Disk I/O.

You can dig a little deeper by using strace -p <pid>

pjz
  • 10,595
  • 1
  • 32
  • 40
  • 1
    I found the problem (I hope). A website had a page where visitor IP was geolocated using MaxMind GeoIP and its PHP library. MaxMind GeoIP database is huge (30mb) and the page did not close the connection to the file. Since I added an explicit close after the geolocation process, no more performance problem has been detected. –  Jul 19 '10 at 18:54
  • That was not the problem... I updated my question –  Aug 01 '10 at 12:56
  • Nope, if they'd be waiting for disk IO, they would be in D state, not S... – Phillipp Sep 28 '17 at 13:40
  • solved my problem after using this – Sohail Yasmin Dec 11 '18 at 10:07
1

There is already a thread with information that could help you.

I would recommend you to first try the accepted post about "server-status" and then what I recommended: to change the LogLevel.

Raffael Luthiger
  • 2,001
  • 2
  • 17
  • 26
0

Setup server status specified in the recommended existing thread. The process sizes look excessive. This looks like you may have a memory leak.

I have found setting MaxRequestsPerChild to 1000 or 100, can help when there are memory leaks or other problems in the code. This will usually kill off the processes before they cause a problem. This may give you time to trace the problem.

BillThor
  • 27,737
  • 3
  • 37
  • 69
  • I set the new value in the configuration 4 hours ago, and I think your suggestion works. Will let run it like that till tommorrow and let you know. –  Aug 03 '10 at 19:02
  • I was able to track down the problem using server-info page of apache in extended mode. http://httpd.apache.org/docs/2.0/mod/mod_info.html. One website was the target of spammers that filled its guestbook of tons of junk. Then search engine referenced that junk bringing tons of traffic. –  Aug 09 '10 at 14:47
  • Pierre, "One website was the target of spammers that filled its guestbook of tons of junk." How did you figure it out with server-info ? I've same memory problem. – Kumar Jun 25 '15 at 08:36