Why my server has high load despite mere 1.5% CPU usage

Question

In this is what whm apache status says:

Current Time: Sunday, 23-Dec-2012 05:13:40 CST
Restart Time: Saturday, 22-Dec-2012 13:38:12 CST
Parent Server Generation: 9
Server uptime: 15 hours 35 minutes 28 seconds
Total accesses: 3444470 - Total Traffic: 2.1 GB
CPU Usage: u40.86 s113.4 cu748.01 cs0 - 1.61% CPU load
61.4 requests/sec - 38.9 kB/second - 649 B/request
110 requests currently being processed, 0 idle workers

I have increased the maximum connection and maximum server in whm to 1500 and 3000 respectively.

The server uses hard disk a lot for caching. It only has 10 mbps connection. However, I do not bother increasing it because it only has 38.9 kB/second.

If indeed the bottle neck is IO, ho can I check?

The server also curl other sites a lot and cache the result.

Server is very responsive but there is a little latency.

The IO seems to be the issue: iostat -xdk 1 20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               1.81   413.86    8.63  201.33   190.15  2463.45    25.28    46.85  223.00   3.79  79.68
sdb               0.00     0.00    0.00    0.00     0.02     0.00     8.07     0.00    0.68   0.68   0.00
sdd               0.00     0.00    0.00    0.00     0.02     0.00     8.07     0.00    0.73   0.72   0.00
sdc               0.00     0.00    0.00    0.00     0.02     0.00     8.07     0.00    0.78   0.78   0.00
dm-0              0.00     0.00    1.94  140.75    49.18   562.97     8.58    23.97  168.00   3.88  55.35
dm-1              0.00     0.00    0.00    0.00     0.02     0.00     8.00     0.00    6.65   2.25   0.00
dm-2              0.00     0.00    8.52  475.11   140.85  1900.43     8.44    47.55   98.32   1.63  78.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   292.00    6.00  131.00   244.00  1668.00    27.91     5.14    6.53   2.24  30.70
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00  165.00     0.00   660.00     8.00     5.14    3.37   0.21   3.40
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    5.00  394.00   236.00  1576.00     9.08     1.55    3.92   0.67  26.70

That %util often goes to 100%. So that seems to be the bottle neck.

Vmstat doesn't seem to be the problem:

root@host [/var/log]#  vmstat 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12 21      0 1148732 1160660 25192080    0    0    12   155   12   16 24 17 33 26  0
15  0      0 1281500 1160680 25193120    0    0    44  4568 15117 5501 31 19 24 27  0
12  3      0 1313904 1160684 25193728    0    0   104  1576 15960 5996 32 22 45  1  0
 7 10      0 1322328 1160692 25194140    0    0    16  3024 14354 5274 28 19 20 33  0
 6 12      0 1251420 1160704 25194848    0    0    96   452 13551 5208 24 19 32 26  0
20  0      0 1312052 1160708 25195592    0    0    76  4092 14885 5727 28 19 50  3  0
 3  0      0 1341072 1160728 25196652    0    0   456  3888 13056 5113 24 15 57  4  0
 6  1      0 1302052 1160728 25197448    0    0   188   936 11235 4372 20 15 66  0  0
11  9      0 1267768 1160744 25197872    0    0    16  2388 14423 5160 26 20 34 21  0
 5  0      0 1355152 1160748 25198496    0    0    36   504 12269 5302 19 14 52 15  0
 8  0      0 1323712 1160752 25199456    0    0    52  4032 12713 4779 22 16 61  0  0
 7  0      0 1350484 1160760 25199872    0    0    72  2788 13692 5086 25 17 54  4  0
 6  3      0 1334872 1160760 25200320    0    0     8  1088 12882 5193 23 17 60  0  0
 6 10      0 1266724 1160772 25200724    0    0    24  1940 13067 4705 25 19 39 17  0
 6  0      0 1315404 1160776 25201176    0    0    28  1428 11883 4914 19 14 46 21  0
11  0      0 1309244 1160784 25201724    0    0     0  2612 13001 4905 25 17 58  0  0
 4  0      0 1349536 1160796 25202204    0    0    12  2240 13124 4900 24 17 58  2  0
12  1      0 1322520 1160800 25202964    0    0   464  1268 13991 5733 26 19 54  0  0
 5 12      0 1301112 1160804 25203492    0    0    36  2172 13427 4956 25 17 38 20  0
 3  1      0 1374288 1160808 25203780    0    0    96   772 13360 5692 24 16 35 25  0

mpstat seems okay

root@host [/var/log]# mpstat -P ALL
Linux 2.6.32-279.19.1.el6.x86_64 (host.buildingsuperteams.com)  12/23/2012      _x86_64_        (16 CPU)

06:17:20 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:17:20 AM  all   24.23    0.10   16.48   25.59    0.01    0.31    0.00    0.00   33.29
06:17:20 AM    0   24.18    0.09   17.00   34.98    0.00    0.16    0.00    0.00   23.59
06:17:20 AM    1   34.84    0.02   28.32   17.70    0.00    3.39    0.00    0.00   15.74
06:17:20 AM    2   26.35    0.04   20.08   26.29    0.00    0.01    0.00    0.00   27.22
06:17:20 AM    3   19.17    0.03   15.51   29.01    0.00    0.05    0.00    0.00   36.22
06:17:20 AM    4   17.64    0.28    9.33   35.08    0.00    0.26    0.00    0.00   37.42
06:17:20 AM    5   31.61    0.08   24.72   17.62    0.00    0.05    0.00    0.00   25.91
06:17:20 AM    6   24.38    0.07   19.06   20.42    0.00    0.03    0.00    0.00   36.04
06:17:20 AM    7   19.59    0.04   12.55   22.29    0.00    0.02    0.00    0.00   45.50
06:17:20 AM    8   14.21    0.12    8.60   38.27    0.00    0.44    0.00    0.00   38.36
06:17:20 AM    9   34.76    0.20   22.08   23.52    0.19    0.27    0.00    0.00   18.98
06:17:20 AM   10   26.13    0.06   16.03   22.77    0.00    0.01    0.00    0.00   35.00
06:17:20 AM   11   20.32    0.08   10.69   24.18    0.00    0.01    0.00    0.00   44.72
06:17:20 AM   12   16.99    0.21    8.50   35.72    0.00    0.17    0.00    0.00   38.40
06:17:20 AM   13   31.21    0.08   23.08   18.30    0.00    0.01    0.00    0.00   27.32
06:17:20 AM   14   25.72    0.06   16.95   21.02    0.00    0.01    0.00    0.00   36.25
06:17:20 AM   15   20.60    0.09   11.18   22.40    0.00    0.01    0.00    0.00   45.73

pastebin the following: top -b -n 10, vmstat 1 20, iostat -xdk 1 20, mpstat -P All 1 20, ps auxH — Soham Chakraborty, Dec 23 '12 at 11:31
Try to follow http://www.pythian.com/news/247/basic-io-monitoring-on-linux/ - getting an idea of the IO situation. If it is IO, then your only choice are rewriting / reparameterizing, using some sort of output cache and / or getting more IO performance (SSD are 100x as fast as hard discs, or more). — TomTom, Dec 23 '12 at 11:32
iostat is put on the question because it seems to be where the problem is. — user4234, Dec 23 '12 at 12:13
ps auxH output is too long and it seems that io is the issue. — user4234, Dec 23 '12 at 12:20
Smells a lot liek "I try to run a server on commodity slow hardware" combined with "I just do waht I want in code, who cares about the damn hardware". — TomTom, Dec 23 '12 at 13:21
Why do you have four hard drives in the system, but only using one of them? — Michael Hampton, Dec 23 '12 at 15:17
Michael, you nail something in the head. Yes I am working on it. — user4234, Dec 24 '12 at 06:48

score 4 · Answer 1 · answered Dec 25 '12 at 12:35

4

iotop is a pretty good tool to understand the IO usage in your machine and what all processes are doing it.

To install in rhel/centos flavors

 # yum install iotop -y

For flavors like Ubuntu:

 # apt-get install iotop

answered Dec 25 '12 at 12:35

Napster_X

3,373
18
20

score 2 · Answer 2 · answered Dec 23 '12 at 13:30

You should never use apachectl to measure the performance of the system. That is from apache's point of view, which may be completely wrong with respect to how the rest of the operating system performs.

iostat, part of the sysstat package can measure io performance. If you want to find out which specific process is taking the io, you can also use iotop (available through EPEL repository -- though, I'd guess it would say "apache"). From iostat, you want low as possible util% which in turn give you a very low await value.

Your mpstat is does NOT appear to be fine. Again, you are showing a high IO usage (%iowait). For websites in general, you want iowaits to be under 1% to be well responsive. You are also using fairly high system based usage in ratio for a typical apache environment. But there is insufficient data to figure out why at the moment.

Although not part of what was asked, you should familiarize yourself with using top as your most basic diagnosis tool of the system as it will give the overall glance at all aspects of the system. The most important part of the top output is available at literally the top of the output (which you ironically left out in your pastebin).

Lastly, if you mean maxclients by "maximum server" setting of your apache. 3000 is far too high for any system in the world. I don't think even those half a million dollar system would be able to handle that many apache processes. You'd be in a real pickle if apach decides to shoot up server count for any reason. Ideal numbers, however, can only be measured through testing the specific application under the specific machine. Basically, your max server * amount of memory each server uses should be equal to your total ram available (not including swap since you don't want to hit swap all the time, also total as in total available for apache, i.e. after OS, other services, etc.).

score 2 · Answer 3 · answered Dec 23 '12 at 15:44

110 requests currently being processed, 0 idle workers

...

I have increased the maximum connection and maximum server in whm to 1500 and 3000 respectively

As Peter says there is rather a lot of IO going on here - but I don't think that's the only problem. Why doesn't your server have lots of idle workers? 16 cores? This is a bad setup. It makes no sense to use big iron for webserving. Setting the serverlimit much higher than the maxclients doesn't make much sense. It looks like something is constraining the number of apache threads - we'd need to see what your core settings are from httpd.conf

I suspect that the irqbalancing is not optimal. It looks like the application workload is evenly distributed.

Why my server has high load despite mere 1.5% CPU usage

But you don't give any metrics for load.

As Peter says, you should start by looking at top.

The server also curl other sites a lot and cache the result....Server is very responsive but there is a little latency.

So is the latency due to the remote access? Something else?

You're saying there's a problem here - but without knowing what the problem you are trying to solve is, it's difficult to give any advice. Certainly there's a lot of writes going on and the pattern of data suggests lots of very small chunks of data (and similarly your HTTP traffic looks strange) but without knowing a lot more about what's going on here its impossible to advise.

score 1 · Accepted Answer · answered Jan 08 '13 at 09:03

I filed a ticket at cpanel.

The competent guy there told me that the problem is kjournald wrote 5-10MB files each time.

I am not really sure why it wrote so much.

I moved to SSD and it sort of works.

Basically I need to run iostat -o -a and see that kjournald is the culprit.

It causes so much IO writes that the disk utilization is always 100% because of it.

Why my server has high load despite mere 1.5% CPU usage

4 Answers4