5

We have some problems with very slow disk response on our server. I checked iostat (iostat -d -x 30) and have some problems with its interpretation:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               1.04   396.31    6.60   57.44   382.47  3649.21    62.95    10.31  160.87   8.64  55.36
sda               6.26   391.15   16.16   62.75  1810.79  3649.22    69.19     2.97   37.66   1.79  14.13
md0               0.00     0.00    0.55    0.01    16.88     0.08    30.11     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.02    0.07     1.10     0.54    18.31     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.02    0.04     0.13     0.34     8.00     0.00    0.00   0.00   0.00
md3               0.00     0.00   29.48  453.28  2175.15  3643.46    12.05     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00    56.15    0.70   81.34    12.00  1110.03    13.68    47.56  600.17   5.23  42.89
sda               0.00    51.02    0.47   81.37     4.53  1059.38    13.00     0.32    3.95   0.69   5.64
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.17   47.45    16.53   379.61     8.15     0.00    0.00   0.00   0.00

First one is initial (history) statistics iostat displays, the second one is after 30 seconds.

Why is await for sdb so higher than for sda? OK, because svctm is also higher (svctm is part of await but also influences the queue length). But why, if there are in mirror? They are both exactly the same disks, smartctl does not report any problems or significant counters difference:

SDA:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   145   021    Pre-fail  Always       -       9716
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       71
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14623
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       69
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       68
193 Load_Cycle_Count        0x0032   113   113   000    Old_age   Always       -       262965
194 Temperature_Celsius     0x0022   126   114   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SDB:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   145   021    Pre-fail  Always       -       9708
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       67
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14622
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       65
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   113   113   000    Old_age   Always       -       261839
194 Temperature_Celsius     0x0022   128   115   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

/etc/fstab:

proc            /proc           proc    defaults        0       0
/dev/md0        /               ext3    relatime,errors=remount-ro 0       1
/dev/md1        /var            ext3    relatime        0       2
/dev/md2        none            swap    sw              0       0
/dev/md3        /vz             ext3    relatime        0       3
/dev/hda        /media/cdrom0   udf,iso9660 user,noauto 0       0

Some measurements with iostat -d -x 2 (every two seconds) under heavy load. You can see that both disks can have longer queue and waiting time, but sda successfully reduces this, while sdb keeps having the longer waiting time. This is strange as disks are the same and it is RAID-1 (mirror).

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    6.00     0.00  6144.00  1024.00    21.40 4545.00 166.67 100.00
sda               0.00     0.00    2.00    1.00    16.00     8.00     8.00     0.49  390.00  75.33  22.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.50    0.50    12.00     4.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  1405.00    0.50   23.00     4.00 10632.00   452.60    18.96 1889.62  41.62  97.80
sda               0.50  1401.50    1.50   37.50   120.00 11512.00   298.26     4.29  110.00   3.13  12.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    2.50 1439.00   124.00 11512.00     8.07     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  1995.50    0.00   29.00     0.00  5304.00   182.90    13.64  873.31  34.34  99.60
sda               0.50  1986.50    6.50   28.50   512.00  1664.00    62.17     0.57    7.14   1.89   6.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    7.00 2046.00   512.00 16368.00     8.22     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00   930.00    0.00   18.50     0.00  1192.00    64.43    92.52  859.68  54.05 100.00
sda               0.00   928.50    0.00   35.50     0.00 18192.00   512.45    51.52  701.97  28.17 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00  946.50     0.00  7572.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   16.00     0.00  8976.00   561.00    56.14 2710.38  62.50 100.00
sda               0.00     0.00    0.00   13.50     0.00  4084.00   302.52     6.26 2457.63  47.56  64.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00 10240.00  1024.00    33.75 4877.20 100.00 100.00
sda               0.00     0.00    0.50    0.00     4.00     0.00     8.00     0.01   16.00  16.00   0.80
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.50    0.00     4.00     0.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  3245.50    1.50   31.50   208.00 12756.00   392.85    64.57 2644.30  30.24  99.80
sda               0.00  3245.00    2.00   60.50   108.00 26444.00   424.83    17.03  272.42   4.61  28.80
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    3.50 3305.50   316.00 26444.00     8.09     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    8.00     0.00  8192.00  1024.00    74.48 2241.50 125.00 100.00
sda               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     3.00    0.00   22.50     0.00  5192.00   230.76    58.21 3204.18  44.44 100.00
sda               0.00     3.00    3.50    6.50    48.00    76.00    12.40     0.09    8.00   5.60   5.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    4.00   10.00    52.00    80.00     9.43     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.50  4098.50    1.50   31.50   324.00  4160.00   135.88    78.08 3401.39  30.24  99.80
sda               0.50  4084.00    2.00   32.00   216.00  8200.00   247.53    57.79   27.53  15.35  52.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    4.00 4173.00   536.00 33384.00     8.12     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   20.50     0.00  9228.00   450.15    97.71 1776.78  48.78 100.00
sda               0.00     0.00    0.00   32.00     0.00 13536.00   423.00    72.55 1675.31  31.25 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   13.00     0.00  7220.00   555.38    67.20 3830.46  76.92 100.00
sda               0.00     0.00    0.00   25.50     0.00 11652.00   456.94    38.91 4491.14  39.22 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.50    0.50     4.00     4.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   11.00     0.00  5548.00   504.36    50.62 6367.45  90.91 100.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     2.37    0.00   0.00 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    7.50     0.00  6648.00   886.40    28.48 7513.07 133.33 100.00
sda               0.00     0.00    1.50    3.50    12.00    28.00     8.00     0.24  560.80  20.80  10.40
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.50    2.00    12.00    16.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00  4036.00   403.60    12.15 9193.00 100.00 100.00
sda               0.00     0.00    1.00    0.50     8.00     4.00     8.00     0.02   14.67  14.67   2.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.00    0.50     8.00     4.00     8.00     0.00    0.00   0.00   0.00
Mitar
  • 517
  • 4
  • 18
  • Please post your /etc/fstab. Your swap space may not be on RAID. If you run iostat every 2 seconds is the pattern consistent. – jeffatrackaid Jan 02 '12 at 17:01
  • Posted fstab. Currently there is no high load on the server for problem to appear. Will run it every 2 seconds when I again notice the slowdown. – Mitar Jan 02 '12 at 21:12
  • I see a /vz is this a virtualized system? If so, I often find odd metric behavior on virtualized disks. – jeffatrackaid Jan 03 '12 at 15:06
  • Yes, it is. We are using OpenVZ. But I am reading stats on host, so metric should be correct. Also, OpenVZ "sees" RAID disks and not hard drives themselves, so there would be no reason why one hard drive would be slower than the other. – Mitar Jan 04 '12 at 02:27
  • jeffatrackaid, I have added more measurements (for every 2 seconds). – Mitar Jan 04 '12 at 02:31

2 Answers2

3

There's an option -W or --write-mostly which is described very similar to what you get: «…

This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link.

…» — man mdadm

Check it out. This could be the issue.

poige
  • 9,448
  • 2
  • 25
  • 52
0

I am not sure if there actually is a problem. Perhaps you're just reading more into these iostat results than there is. I did some searching and it seems iowait output is confusing.

To quote http://us.generation-nt.com/answer/high-await-iostat-help-201223422.html

iowait is one of the most confusing measurements in linux as it had nothing to do with CPU usage! Rather, it just tells you the > cpu had nothing else to do AND there was I/O in progress, something you'd expect to see when moving files around. Clearly you should care if your other CPU loads indicators are high but this is not one of them

You should try and run more or less real life speed tests. Try bonnie: http://packages.debian.org/squeeze/bonnie++ Also look into sysstat: http://packages.debian.org/squeeze/sysstat

And do some regular copying/moving of files back and forth.

Compare the results with another system with a similar setup. If you see no obvious speed issues then it's not outside the realm of possibilities that there is in fact no problem.

aseq
  • 4,610
  • 1
  • 24
  • 48