1

We have several machines on Amazon (ec2) of the type c1.xlarge with 16 cpus, running the Amazon AMI.

Details on the machine:
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

One out of the several machines is showing a high load average, since we have run the last yum upgrade a couple of weeks a go. We did not yet update the other machines, and everything looks normal on them.

The strange thing is that the top command not showing any hint for the cause of the load. CPUs are 4.8%us, 1.1%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st(see below). Mem is about 1.5GB free.

Any idea what could it be, or where else can we check? Many thanks for the help.

#
# top
#
top - 07:57:42 up  4:18,  1 user,  load average: 1.36, 1.45, 1.47
Tasks: 131 total,   1 running, 130 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.8%us,  1.1%sy,  0.0%ni, 94.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7120092k total,  5644920k used,  1475172k free,   532888k buffers
Swap:        0k total,        0k used,        0k free,  3463936k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1557 mysql     20   0 1829m 374m 6448 S 14.3  5.4  11:15.09 mysqld
 6655 apache    20   0  416m  49m 3744 S  9.3  0.7   0:04.85 httpd
27683 apache    20   0  421m  54m 3708 S  9.0  0.8   0:00.99 httpd
 6682 apache    20   0  424m  57m 3788 S  8.3  0.8   0:03.81 httpd
16816 apache    20   0  419m  51m 3760 S  4.3  0.7   0:04.09 httpd
22182 apache    20   0  417m  50m 3756 S  1.7  0.7   0:06.34 httpd
  219 root      20   0     0    0    0 S  0.3  0.0   0:00.34 kworker/7:1
  699 root      20   0     0    0    0 S  0.3  0.0   0:00.40 kworker/3:1
    1 root      20   0 19376 1508 1212 S  0.0  0.0   0:00.29 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd

3 root 20 0 0 0 0 S 0.0 0.0 0:00.71 ksoftirqd/0

"iostat" command on the proper machine:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.97    0.03    4.46    0.19    0.14   86.23

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvdap1            1.60         0.69        55.38     587620   47254184
xvdfp2            2.64         1.10        61.04     934786   52091056
xvdfp4            0.86         0.19        41.72     163866   35601920
xvdfp1            4.37        36.59        73.89   31220810   63051504
xvdfp3            8.03         7.08        94.63    6045402   80749184

"iostat" command on problematic machine:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.29    0.04    5.55    0.26    0.11   84.74

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvdap1            2.13         3.34        68.85     246244    5077888
xvdfp1            7.60        74.31       104.88    5480362    7734840
xvdfp3           13.22        73.67       125.00    5433386    9218600
xvdfp4            1.11         0.76        65.08      55762    4799248
xvdfp2            4.16         3.31        99.17     243818    7313264

Does anyone know what I need to do?

Thanks

Oz.
  • 21
  • 1
  • 4

2 Answers2

3

With 8 virtual cores, a load average of 1.4 isn't high or anything to worry about (you're safe up until you average a load of 8). But based on that top output alone, there isn't anywhere near enough information to assist you further. And given the machine has only been up for 4 hours - MySQL is still probably in the process of priming all its caches.

Its probably I/O and Amazon isn't providing quite the right information for iowait to be accurately plotted (fairly typical for a VPS).

  1. Run iostat and post the results.
  2. Start graphing with Munin and report back with some statistics once the machine has been running for a few days.
Ben Lessani
  • 5,244
  • 17
  • 37
  • @Oz. - you would be better deleting these comments, and editing your original question including these results. – Ben Lessani Jun 24 '12 at 16:15
0

High Load Average can be caused by I/O problems.

Try running iostat -x 10 10

And observe the await and %util numbers over time.

  • await – average time that each IO Request took to complete. This includes the time that the request was waiting in the queue and the time that the request took to be serviced by the device

  • %util: This number depicts the percentage of time that the device spent in servicing requests.

Adi Dembak
  • 279
  • 1
  • 2
  • 9