1

I'm running a RHEL 5.5 with multipath@HSV200 storage system.

The disk performance for write is VERY poor comparing to Windows-system counterparts (which are using the same storage and multipath).

Here are the results:

mpath17 (3600508b400105f9d0002100000780000) dm-12 HP,HSV200
[size=850G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=100][active]
 \_ 2:0:1:30  sdaw       67:0   [active][ready]
 \_ 1:0:1:30  sdc        8:32   [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 2:0:0:30  sdau       66:224 [active][ready]
 \_ 1:0:0:30  sda        8:0    [active][ready]

`atop` result:

LVM |      mpath17  | busy     99% |  read    3077 | write      6  | KiB/r     90 |               | KiB/w      4 |  MBr/s  27.11 | MBw/s   0.00  | avq     2.41 |  avio 3.21 ms 

Note how the "busy" is 99% - and this happens most of the time.

The multipath.conf is using the recommended HP best-practices for this storage:

device {
                vendor                                   "HP"
                product                                  "HSV2[01]0|HSV3[046]0|HSV4[05]0"
                path_grouping_policy        group_by_prio
                getuid_callout                      "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout                           "/sbin/mpath_prio_alua /dev/%n"
                path_selector                       "round-robin 0"
                path_checker                       tur
                hardware_handler              "0"
                failback                  immediate
                rr_weight                               uniform
                rr_min_io                              100
                no_path_retry                       18
}

Is there any way to diagnose this event? I want to understand where's the bottleneck in this scenario... Any suggestions where to start?

(This is my first post here, thank you very much)

  • Can you provide a little more context? How long has this been a problem? RHEL 5.5 is very old by today's standards, so has this been a persistent issue, or did it appear recently? – ewwhite Dec 21 '12 at 14:52
  • @ewwhite well, this event happens whenever there's a big load on the disk e.g. a full daily backup. What doesn't make sense is that low performance (100% busy with 30MB/s!) – Daniel Sartori Dec 21 '12 at 18:57

2 Answers2

0

This could be a symptom of a performance problem. How is the storage behind this LUN configured? What disk type, how many disks, and what raid type? Is the cache set to write-back?

You mentioned in a comment that you're quantifying the disk utilization by MB/s, however most of the time, the limitation for non-SSD drives isn't MB/s, but IO/s as they have to seek a lot for random reads.

Basil
  • 8,851
  • 3
  • 38
  • 73
0

The whole problem was the disk controller; it didn't have a cache controller so it performed poorly in many ways - such as huge file writes or many files writing at the same time.

Thank you for the diagnosis.