2

running out of ideas to explore. First off - let me warn you - I'm a programmer, not a systech :)

Here is the situation.

Dedicated server (LAMP) running a fair amount of sites. mySQL server is on a seperate box.

Last couple weeks, performance has been steadily degrading to the point where I can no longer even remote into the box.

Looking into mod_status, there are a fair amount of processes taking up CPU resources. However, the URLs are all different... there is not a common pattern - so I can't narrow anything down to a particular script that might be getting stuck.

PHP is ran as cgi.

Majority of the sites that are taking a while to run are using the cakephp framework

Restart the server, we are down within a few minutes again...

Crossed an error that said /var/tmp/ was full and couldn't write sessions. However, there was still room? Lack of inodes perhaps? Currently in the process of having someone walk down to the box and clear tmp.

Could the lack of ability to write sessions be causing the php processes to hang forever, and eventually clog everything up?

Any other ideas that I might want to explore? I have been monitoring the sql server to see if it is returning huge datasets in any of the queries, and there is nothing notable in there....

It's only 11:21am here and I already need a drink :)

Ryan
  • 53
  • 1
  • 5
  • 1
    as you are getting progressively poorer performance this sounds like a memory leak. However I'm no sysadmin so perhaps post this over on serverfault. – ToonMariner Aug 15 '12 at 18:28
  • processes have urls? – Marc B Aug 15 '12 at 18:33
  • Ian - thanks, I will post over there as well. Marc - yes, but nothing common.. just random pages on a variety of sites. Only common denominator is that they are sites that are using cakephp. – Ryan Aug 15 '12 at 18:36

2 Answers2

4

I assume its a memory problem.

  1. Apache is eating a lot of RAM.

  2. PHP also has a lot of memory leaks. You should configure it to restart its worker threads after handling some low amount of request (100 is a good number). Look in /etc/init.d/php-cgi (or similar) for a line "PHP_FCGI_MAX_REQUESTS=20" ... that the limit. Also set a reasonable limit for the number of children like "PHP_FCGI_CHILDREN=15". I would also suggest you to use php-fpm if possible, thats much more stable and has less memory leaks.

TODO:

  1. Try to look for killed processes in your syslog (/var/log/syslog or /var/log/messages depending on distribution). There might be such a hint.
  2. To track the problem down, try to use "atop" (process monitor like top, but some more features) and press "p", that accumulates all statistics by process names. Have a look at what's eating up the RSIZE.
SDwarfs
  • 385
  • 4
  • 18
  • Will look into. Would a lack of memory result in high cpu usage? – Ryan Aug 15 '12 at 18:37
  • If the server starts swapping, most definitely, yes. –  Aug 15 '12 at 18:46
  • Yes, if your memory is full the system starts to use swap space. That means delays in executions of programms. They get blocked until memory is transfered to disk and back... that way the CPU-load goes up too. Have an eye on "swap usage" (at best there shouldnt be any) and WAITIO (in "top" displayed as "1.0%wa" or similar); high WAITIO means that processes wait for IO to finish (may be network/disk but is most times the hard drive). Also a defective hard drive may be the problem. Try "hdparm -t /dev/sda" where /dev/sda is you hdd, if below 10MB/s its probably defective and causes probs. –  Aug 15 '12 at 18:46
  • Got the /var/tmp cleared up... performance seems to have been improving.. Memory doesn't seem to be an issue at the moment. We have 16 gigs on the box. (5 in usage at the moment). Continuing to investigate. – Ryan Aug 15 '12 at 19:41
  • Ok, what is "top" saying about WAITIO? What is the result of "hdparm"? –  Aug 15 '12 at 20:08
  • I do not see anything referring to WAITIO when I run "top"... hdparm is not available – Ryan Aug 15 '12 at 20:43
  • General copy/paste on top... last pid: 71619; load averages: 15.19, 10.93, 12.64 up 0+02:17:14 13:38:32 1466 processes:7 running, 1458 sleeping, 1 stopped Mem: 7422M Active, 1134M Inact, 2086M Wired, 1646M Buf, 5212M Free Swap: 8192M Total, 8192M Free – Ryan Aug 15 '12 at 20:43
  • Ok, the "load" seems to be quite high; this is the average (1, 5 and 15 mins average, thought) number of processes that want to execute something; Should be less than number of (virtual) CPUs; Number of processes seems way to much... seems that your request are handled to slow somehow, eventually you get an overload situation => Users abort requests and reload pages and cause even more load; WaitIO-Info seems missing (which operating system is this?) –  Aug 15 '12 at 21:01
  • You could try "mysqladmin processlist" (shell command), this shows which SQL command are currently executed and possibly waiting for tables to be unlocked. Also have a look at /var/log/mysql.log; maybe you also need to repair your tables (command: "mysqlcheck -A"). –  Aug 15 '12 at 21:04
  • freebsd 7.4. mysql is on another box that I don't have shell access for. I am checking the slow query log and there isn't to much in there. – Ryan Aug 15 '12 at 21:09
  • You can also connect to the mysql server via a mysql client and then execute "SHOW PROCESSLIST;" as sql statement. –  Aug 15 '12 at 21:17
  • Another interesting tidbit, watching mod_status - the scoreboard is getting filled up "G"s - "gracefully closing" but never close. I found running processes in phpMyAdmin... nothing is really sitting there.. quick in and out – Ryan Aug 15 '12 at 21:21
  • Hm... many "gracefully closing" should mean that the job is executed at the server but you need to send the rest of the data to the client (which is in the send buffer). Possibly there is a network problem (much packet loss), that drops also RST/FIN-packet of TCP connection. Check network status via cmd: "mtr google.com" (install mtr, if needed) or try ping google.com and look for long delays or dropped packets. –  Aug 15 '12 at 21:35
  • Also a misconfigured (stateful) firewall could be the problem. Ask some people weather they can access the server as normal. Especially from outside of your local network (if its an internet service). –  Aug 15 '12 at 21:37
  • Thanks Stefan... will look into further. The mysql server is behind some sort of firewall or something, as it can only be accesses from the webserver in question. – Ryan Aug 15 '12 at 21:44
  • I meant you should check the network availability of the webserver (not the database server)... As the firewall of the database server shouldn't be causing the "gracefully closing" many state connections of your apache server. This is caused by something between user/browser -> network -> apache/webserver. –  Aug 15 '12 at 22:27
  • Ah - yes.. No issue pinging server. It isn't in my local network anyways. – Ryan Aug 15 '12 at 23:17
0

You really need to look from inside the box instead of outside, so see what resource is being consumed.

My guess would be that apache's process pool is exhausted (so no one can connect) or physical memory is exhausted (so performance falls off a cliff).

ddyer
  • 101
  • 1