Okay I've got a nice dedicated server running CentOs 6 with 16gb of ram, dual xenon processors, etc. However I've been experiencing high loads due to mysql. Randomly the load will go below 1.0, and page generation time will be < 30ms, and the site is preforming smoothly. This is with about 100 concurrent users, serving less than 200 pages/minute. However 99% of the time its very slow and has crazy high loads usually at least 4 sometimes in the 100s. We did not use to have this problem, this server used to be able to handled 400 concurrent users and 1000 pages/minute without having the loads go above 1.5.
The first thing I did was implement db caching in PHP with ADOdb. That helped out a little bit but did not resolve the problem.
I've looked all over the internet and cannot seem to find out what is wrong. I asked a friend to take a look and he had no clue. I got my host to switch us over to a new machine, same problem after a few hours. We should not be getting this high a load for the traffic we are getting.
I'm starting to think it might have something to do with /tmp. I was able to get the load down back to normal for a while after running 'tmpwatch --mtime --all 1 /tmp'. However that did not work again after the loads spiked.
If anyone has any idea of what is wrong, I would greatly appreciate it. I'm not sure what you might use as metrics, but I've included some I think might help.
'top' output:
top - 22:02:36 up 1 day, 23:39, 1 user, load average: 4.01, 4.38, 4.50
Tasks: 233 total, 1 running, 231 sleeping, 0 stopped, 1 zombie
Cpu(s): 25.5%us, 2.0%sy, 0.0%ni, 70.5%id, 2.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16331836k total, 16034868k used, 296968k free, 375472k buffers
Swap: 18546680k total, 0k used, 18546680k free, 14421512k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31149 mysql 20 0 1589m 32m 6024 S 191.0 0.2 5:54.80 mysqld
27575 apache 20 0 312m 13m 3464 S 2.7 0.1 0:04.06 httpd
29427 apache 20 0 317m 18m 3484 S 2.7 0.1 0:02.76 httpd
25331 apache 20 0 311m 12m 3440 S 2.3 0.1 0:05.55 httpd
21331 apache 20 0 408m 15m 3676 S 2.0 0.1 0:08.57 httpd
24226 apache 20 0 314m 14m 3484 S 2.0 0.1 0:06.45 httpd
32352 apache 20 0 311m 12m 3424 S 2.0 0.1 0:01.01 httpd
32377 apache 20 0 312m 13m 3484 S 2.0 0.1 0:00.86 httpd
774 apache 20 0 312m 12m 3108 S 1.7 0.1 0:00.11 httpd
28165 apache 20 0 406m 12m 3588 S 1.7 0.1 0:03.76 httpd
30516 apache 20 0 311m 12m 3476 S 1.7 0.1 0:02.04 httpd
31019 apache 20 0 313m 13m 3436 S 1.7 0.1 0:01.68 httpd
31020 apache 20 0 314m 15m 3484 S 1.7 0.1 0:01.71 httpd
657 apache 20 0 311m 12m 3108 S 1.3 0.1 0:00.20 httpd
27731 apache 20 0 406m 12m 3572 S 1.3 0.1 0:03.69 httpd
28180 apache 20 0 313m 13m 3480 S 1.3 0.1 0:03.43 httpd
30565 apache 20 0 314m 14m 3488 S 1.3 0.1 0:02.07 httpd
'df' output
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg-root 461164576 45283168 392455568 11% /
tmpfs 8165916 0 8165916 0% /dev/shm
/dev/sda1 247919 72922 162197 32% /boot
/dev/mapper/vg-tmp 1032088 137344 842316 15% /tmp
'iostat' output
Linux 2.6.32-220.el6.x86_64 (domain redacted) 09/05/2012 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
4.88 0.00 0.49 1.53 0.00 93.09
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 18.03 98.46 787.86 17060770 136515216
dm-0 76.33 98.15 605.19 17006740 104862680
dm-1 0.00 0.02 0.00 3176 0
dm-2 22.78 0.21 182.06 35730 31546312