Dramatic disk I/O performance difference for two disks on same machine

Question

A small machine (1 core, 1 GB RAM, CentOs 6.3) virtualized via Citrix Xen has 3 virtual disks with very different sizes.

> cat /etc/fstab (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root   /   ext4    defaults    1   1 # on /dev/xvda
/dev/disk/by-uuid/8048fd86-3aa3-4cdd-92fe-c19cc97d3c2e  /opt/xxx/data/nexus ext4    defaults    0   0
/dev/disk/by-uuid/58f16c69-786e-47d0-93ae-d57fb0cbd2a9  /opt/xxx/data/nfs   ext4    defaults    0   0

> mount (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root on / type ext4 (rw) 
/dev/xvdb1 on /opt/xxx/data/nexus type ext4 (rw)
/dev/xvdc1 on /opt/xxx/data/nfs type ext4 (rw)

> df -h (snippet)
...
/dev/mapper/vg_stagingnfs-lv_root
                      5.5G  3.1G  2.2G  59% / 
/dev/xvdb1            2.0T   60G  1.9T   4% /opt/xxx/data/nexus
/dev/xvdc1            729G  144G  548G  21% /opt/xxx/data/nfs

Device /dev/xvda is a virtual disk inside a "storage repository" backed by a 4-Disk-Raid5. Devices /dev/xvd{b|c} are virtual disks both inside another "storage repository" backed by another 4-Disk-Raid5. Disk performance between (let's keep it simple) xvda and xvdb is dramatically different:

> dd if=/dev/zero of=/root/outfile bs=1024k count=1000
1048576000 bytes (1.0 GB) copied, 8.61225 s, 122 MB/s

> dd if=/dev/zero of=/opt/xxx/data/nexus/outfile bs=1024k count=1000
1048576000 bytes (1.0 GB) copied, 86.241 s, 12.2 MB/s

I haven't spotted any obviously explaining differences via top, atop, iotop or iostat. During both dd-ings I notice 3 main commands causing load: dd, flush-xxx and jdb2/xvdbxxx. Main types of load are %sy and %wa. During dd-ing on xvda relation %sy:%wa seems roughly like 20%:80%, during dd-ing on xvdb it's almost looks like 0%:100%.

Now the big question: WTF? I'm running out of ideas how to further track down the root cause. Any ideas how to get to the bottom of this?

Your help is highly appreciated!

I'll add some extra information:

both storage repositories are LVM-backed
both are local to the Xen host
strange: the faster storage repository contains virtual disks of > 20 other VMs (and xvda of this VM); disks xvdb/xvdc are the only disks in the slower storage repository and are only attached to this very VM. Anyway I additionally created a third virtual disk on that slow storage repository and attached it to a different VM - same effect...

Information gathered on the Xen host (mostly looking for evidence of bad disks):

# xe sr-list (snippet)
...
uuid ( RO)                : 88decbcc-a88c-b368-38dd-dc11bfa723f6
          name-label ( RW): Local storage 2 on xen-build2 
    name-description ( RW): RAID5 4x1TB 7.200 rpm MDL Disks # a.k.a. the too slow one
                host ( RO): xen-build2
                type ( RO): lvm
        content-type ( RO): user
uuid ( RO)                : b4bae2a7-02fd-f146-fd95-51f573c9b27d
          name-label ( RW): Local storage 
    name-description ( RW): # a.k.a. the reasonably fast one
                host ( RO): xen-build2
                type ( RO): lvm
        content-type ( RO): user

# vgscan -v (snippet)
Wiping cache of LVM-capable devices
Wiping internal VG cache
Reading all physical volumes.  This may take a while...
Finding all volume groups
Finding volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6"
Found volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6" using metadata         type lvm2
Finding volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d"
Found volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d" using metadata type lvm2

# lvmdiskscan (snippet)
...
/dev/sdb     [      838.33 GB] LVM physical volume # reasonably fast
/dev/sdc     [        2.73 TB] LVM physical volume # too slow
3 disks
16 partitions
2 LVM physical volume whole disks
1 LVM physical volume

# vgck -v
Finding all volume groups
Finding volume group "VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6"
Finding volume group "VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d"

# pvck -v
(no output)

# lvs
LV                                       VG                                                     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
MGT                                      VG_XenStorage-88decbcc-a88c-b368-38dd- dc11bfa723f6 -wi-a-   4.00M                                      
VHD-2190be94-2e94-4df1-a78e-b2ee1edf2400 VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6 -wi-ao   1.76G                                      
VHD-b1971dad-60f0-4d3a-a63d-2f3184d74035 VG_XenStorage-88decbcc-a88c-b368-38dd-dc11bfa723f6 -wi-ao 741.45G                                      
VHD-f0c7cc8f-1d69-421d-8a57-97b20c32e170 VG_XenStorage-88decbcc-a88c-b368-38dd-  dc11bfa723f6 -wi-ao   2.00T                                      
MGT                                      VG_XenStorage-b4bae2a7-02fd-f146-fd95- 51f573c9b27d -wi-a-   4.00M                                      
VHD-02a0d5b5-a7e5-4163-a2fa-8fd651ed6df3 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  20.05G                                      
VHD-0911628d-e03a-459a-83f4-f8c699aee619 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  50.11G                                      
VHD-0950ba89-401d-433f-87bb-8f1ab9407a4b VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  30.07G                                      
VHD-18e93da6-d18d-4c27-8ea6-4fece41c75c1 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi---   8.02G                                      
VHD-1b5ced06-a788-4e72-9adf-ea648c816e2e VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 256.00M                                      
VHD-22fe1662-6b5d-49f5-b729-ec9acd7787ee VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 120.24G                                      
VHD-23cb8155-39c1-45aa-b6a5-bb8a961707b7 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao   8.02G                                      
VHD-25913e86-214f-4b7f-b886-770247c1d716 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  10.03G                                      
VHD-44c5045c-6432-48cf-85d3-646e46a3d849 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi---  20.05G                                      
VHD-4d5f779d-51a9-4087-b113-4d99f16d6779 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  50.11G                                      
VHD-4e4749c7-8de6-4013-87cb-be53ac112f4f VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  30.07G                                      
VHD-503a68d4-182f-450e-8c34-7568f9472668 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  20.05G                                      
VHD-5dc961e0-beb2-4ce3-b888-b16a26dd77a5 VG_XenStorage-b4bae2a7-02fd-f146-fd95-  51f573c9b27d -wi-ao  50.11G                                      
VHD-6d4ee024-789a-46f5-8922-edf15ac415cd VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  50.11G                                      
VHD-7b80f83f-6a0f-4311-8d32-c8f51b547b3d VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao 120.24G                                      
VHD-81aa93fa-dbf5-4a4a-ba21-20693508ec4a VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  10.03G                                      
VHD-85cb8e94-fd07-4717-8cca-871f07099fb0 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  50.11G                                      
VHD-8e8f63c3-ab21-4707-8736-af0b279c9b7e VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi---  16.00M                                      
VHD-965cc67a-5cb9-4d79-8916-047bfd42955d VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  64.13G                                      
VHD-c1abfb8d-12bc-4852-a83f-ccbc6ca488b8 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi--- 100.20G                                      
VHD-d679959b-2749-47e2-9933-e9f008aea248 VG_XenStorage-b4bae2a7-02fd-f146-fd95-51f573c9b27d -wi-ao  75.15G

AFAICS including more "-v"s still outputs nothing pointing to a bad disk... Any other checks that would identify a bad disk? Thx!

Perhaps theres a failed drive is xvda's 'storage repository'? — Sirch, Jun 21 '13 at 14:40
Increasing RAM to 8 GByte (fs cache?) doesn't change a thing. — user2124712, Jun 21 '13 at 15:05
I'll have that checked. Can't physically get to this machine. Thx! — user2124712, Jun 21 '13 at 15:07

score 1 · Answer 1 · answered Jun 21 '13 at 15:19

1

Given that you have two different RAID sets, one can imagine a lot of possible reasons for the difference in performance. Just because both storage repositories are backed by a 4-disk RAID5, you cannot expect the performance to be similar.

Possible reasons:

slower or dying disks
slower RAID controller
file-backed vs. logical volume backed
local vs. remote (NFS, iSCSI)
differerent filesystem (if file-backed)
I/O performance of one storage repository consumed by other virtual machines
...

I think you should go outside your virtual machine to debug this further.

answered Jun 21 '13 at 15:19

Oliver

5,973
24
33

_slower or dying disks_: Have that checked, can't physically access the machine; _slower RAID controller_: Don't think there's a difference but'll have that check; _file-backed vs. logical volume backed_: Both LVM-backed; _local vs. remote (NFS, iSCSI)_: Both local; _differerent filesystem (if file-backed)_: LVM; _I/O performance of one storage repository consumed by other VMs_: Actually the most "disturbing" part. Disk xvda (fast) is inside the storage repository that contains disks used by >20 other virtual machines. *Disks xvdb/xvdc are the sole disks inside the other storage repository* – user2124712 Jun 24 '13 at 09:48
Can you repeat your test directly on /dev/sdc in the Dom0, i.e. without going through the virtualization layer? – Oliver Jun 24 '13 at 14:32

score 0 · Accepted Answer · answered Jun 23 '13 at 11:45

0

With a performance difference this large, assume a bug (:-))

In all seriousness, common accidental-pessimization problems will slow you by 10-20%. Performance bugs, like the previously-mentioned dying disk, will slow you by orders of magnitude.

As a performance engineer, most of what I see are bugs.

--dave

answered Jun 23 '13 at 11:45

davecb

211
2
5

Empty battery in raid/array controller. Write-Cache then automatically gets disabled. So - a bug. – user2124712 Jun 27 '13 at 09:29

score 0 · Answer 3 · answered Jun 23 '13 at 15:48

0

If those arrays are network mounts, make sure nothing between the machine and the slower array has been throttled down to ~100mbps for some reason.

Other than that check for hardware fault or contention as others suggest.

answered Jun 23 '13 at 15:48

David Spillett

22,754
45
67

Both arrays local, thx – user2124712 Jun 24 '13 at 09:45
The virtualization layer may still be configured to throttle at its neutral layer if it is pretending that the vdisks the guest OS sees are connected via iSCSI or similar. – David Spillett Jun 24 '13 at 10:37
Any idea how I could rule this out? – user2124712 Jun 24 '13 at 11:47
In the VMs check all the network interfaces to see what they are doing and claim to be cable of (under Linux `ethtool ` will return what the NIC claims to support (10/100/1000/other, duplex not/full, ...) and what it has currently negotiated) - if you find it lists 1000+ but is talking at 100 try force renegotiation with `ethtool --negotiate ` or stop+start the interface completely (though be very careful with the latter if anything is might be currently using resources via that link). In the host check the configuration options for the hosts's vNICs and any virtual switches. – David Spillett Jun 25 '13 at 09:43

Dramatic disk I/O performance difference for two disks on same machine

3 Answers3