Does this top look healthy to you?

Question

I run a server with the following specs:

Intel i7 920
8 GB RAM
Linux 2.6.32-25-server #44-Ubuntu 10.04 SMP Fri Sep 17 21:13:39 UTC 2010 x86_64 GNU/Linux
75 Apache processes
Low-end hardware RAID-1 with 2 disks

Historically all our problems with scaling the service have been disk bound but currently we see higher load numbers than before, especially after updating to Ubuntu 10.04. The server handles around 50 requests per second. Swap is not used, and should not be active. The MySQL dataset is some gigabytes but access should be fairly good optimized.

> top
top - 10:42:50 up 16 days, 18:49,  1 user,  load average: 20.02, 16.17, 11.44
Tasks: 277 total,   4 running, 273 sleeping,   0 stopped,   0 zombie
Cpu0  : 38.6%us,  3.3%sy,  0.0%ni, 58.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 37.9%us,  3.3%sy,  0.0%ni, 58.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 25.9%us,  3.0%sy,  0.0%ni, 69.5%id,  1.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  : 23.5%us,  2.0%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  6.6%si,  0.0%st
Cpu4  : 16.4%us,  1.3%sy,  0.0%ni, 82.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  : 15.3%us,  1.3%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 14.3%us,  1.0%sy,  0.0%ni, 84.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  2.3%us,  0.6%sy,  0.0%ni, 97.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8187668k total,  8117276k used,    70392k free,   178920k buffers
Swap:  4198968k total,     2084k used,  4196884k free,  6159328k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32216 mysql     20   0 2026m 788m 4132 S   41  9.9   1292:40 mysqld
 8104 www-data  20   0  491m 106m  95m S    4  1.3   1:57.62 apache2
27072 www-data  20   0  684m 112m 101m S    4  1.4   2:51.47 apache2
 3391 www-data  20   0  683m 109m  98m S    4  1.4   2:22.29 apache2
16822 www-data  20   0  682m 114m 104m S    4  1.4   3:33.05 apache2
27068 www-data  20   0  555m 113m 102m S    4  1.4   2:53.77 apache2
27118 www-data  20   0  683m 119m 106m S    4  1.5   4:41.48 apache2
 1036 www-data  20   0  685m 112m 100m S    3  1.4   2:27.24 apache2
 3503 www-data  20   0  556m  81m  70m S    3  1.0   0:33.77 apache2
29803 www-data  20   0  682m 111m 101m S    3  1.4   2:47.09 apache2
 1345 www-data  20   0  491m 115m 104m S    3  1.4   4:04.62 apache2
 3001 www-data  20   0  379m 109m  98m S    3  1.4   2:13.36 apache2
[... 75 Apache processes with similar specs, but less CPU]

My question is - do you generally see any problems with the high load numbers? The resoponstime has increased, but only by ~30%. Do the load numbers include disk activity to some extent? Do you have any comments what I should focus on during optimizing? Thank you very much!

> iotop
Total DISK READ: 179.70 K/s | Total DISK WRITE: 1735.81 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
16512 be/4 mysql       0.00 B/s   22.94 K/s  ?unavailable?  mysqld
20701 be/4 mysql       0.00 B/s    0.00 B/s  ?unavailable?  mysqld
21556 be/4 mysql       0.00 B/s   22.94 K/s  ?unavailable?  mysqld
28998 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
12771 be/4 mysql       0.00 B/s    3.82 K/s  ?unavailable?  mysqld
16824 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
 2700 be/4 mysql       0.00 B/s    7.65 K/s  ?unavailable?  mysqld
 3074 be/4 mysql      22.94 K/s    0.00 B/s  ?unavailable?  mysqld
17585 be/4 mysql       0.00 B/s   15.29 K/s  ?unavailable?  mysqld
30723 be/4 mysql       7.65 K/s    0.00 B/s  ?unavailable?  mysqld
29906 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
29907 be/4 mysql       0.00 B/s   15.29 K/s  ?unavailable?  mysqld
13547 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
 7444 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
 1944 be/4 mysql     149.11 K/s    0.00 B/s  ?unavailable?  mysqld
16825 be/4 mysql       0.00 B/s    7.65 K/s  ?unavailable?  mysqld
32223 be/4 mysql       0.00 B/s    3.82 K/s  ?unavailable?  mysqld
 7801 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
 5808 be/4 mysql       0.00 B/s   11.47 K/s  ?unavailable?  mysqld
 8104 be/4 www-data    0.00 B/s    3.82 K/s  ?unavailable?  apache2 -k start
18890 be/4 www-data    0.00 B/s    0.00 B/s  ?unavailable?  apache2 -k start
    1 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  init
    2 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [kthreadd]
    3 rt/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [migration/0]

Should I add it there as well? I wasn't aware of ServerFault before... — smint, Oct 07 '10 at 09:27

score 3 · Answer 1 · answered Oct 07 '10 at 10:43

On Linux the load average includes processes in uninterruptable sleep (which includes disk access). Your top output doesn't seem to indicate a lot of IO wait time however. Since top percentages are averaged, I might run top in high frequency update (maybe -d.1 or -d.5) and look for spikes in IO wait that aren't showing up in the default polling frequency as a next step.

score 1 · Answer 2 · answered Oct 07 '10 at 12:38

The standard system metrics (load, CPU, memory etc) are usually good indicators of how the performance of a system is constrained - but ultimately the performance value is all about how quickly it can service a request. In practice its a good idea to monitor these metrics and set thresholds but ultimately these are only indicative of the actual performance of the system.

I think the architecture could be better - at a rough guess, the cost of the server you describe could have bought a 4Gb/dual processor/raid 1+(5/0) for the database and at least 2 low spec machines to run the webservers on (I'm guessing there mod_php or mod_perl in there somewhere too) which would probably be significantly faster.

Certainly it seems to be the mysqld process that's causing most of the pain here - but it looks like apache is doing rather a lot of I/O. How much of your memory id getting used for I/O cache? The RSS for these Apache processes also looks high (the VIRT size too - but thats probably a consequence of the high RSS) approx 10 times the value on the nearest LAMP box I could find.

I'd recommend following the usual recipe here, but looking at the mysql stuff first:

mysql - have you got slow query logging enabled? Have you analysed it to identify potential database optimization
Have you run mysqltuner against your installation?
HTTP caching - are you sending good caching information for static content? Disabling conditional requests?
Why are your apache processes so big? Do you really need all those modules.
What's the range of RTTs to your users? Have you got compression enabled on static text/html content and for script output?
If you're running a PHP site, have you got an opcode cache (e.g. APC, ioncude, Zend) running?

HTH

score -2 · Answer 3 · answered Oct 07 '10 at 12:04

-2

I'd personally worry about the high CPU usage of MySQL. top is only a snapshot though; if you see the cpu for mysql consistently pegged at 50%, I would perform some steps to make sure why.

Load tends to grow exponentially. The time it took MySQL to reach 50% will be much more than the time it takes to hit a 100

answered Oct 07 '10 at 12:04

Evert

129
2
7

1

By default top shows the CPU usage as a % per-cpu so 41% really means its only using about 5% of the total capacity (OK so in some case the workload can't be shared across multiple CPUs - but its certainly not enough to be constraining the eprformance) – symcbean Oct 07 '10 at 12:41

Does this top look healthy to you?

3 Answers3