Uneven utilization of disks in xfs-formatted logical volume

Question

We have a backup server with 66TB of available space, set up like so:

12 6TB RAID10 arrays -> 12 PVs -> 1 VG -> 1 LV -> xfs

This filesystem is used exclusively for backups (through BackupPC). It receives quite a bit of I/O, but definitely not so much that the hardware should have trouble with it. However, we've been experiencing many failed backups, and I recently noticed that even writing a single 10-line file on the mount takes >20 seconds. A run of iostat shows why:

[root@lolno BackupPC]# iostat 
Linux 2.6.18-194.17.1.el5 (lolno)      06/27/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.93    0.00    9.53   31.95    0.00   38.59

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               5.34       115.29        43.07  874600222  326773590
sda1              0.00         0.00         0.00       3586        126
sda2              5.34       115.29        43.07  874594516  326773464
sdb             207.33      3544.02      1594.70 26886233184 12097955904
sdc              11.39       844.42      1033.16 6406058328 7837945704
sdd              11.20       622.92       481.77 4725691832 3654875728
sde              15.84      1812.99      1661.13 13754015304 12601927592
sdf              14.86      1483.24       888.80 11252361120 6742733600
sdg              11.67      1020.94       746.05 7745220408 5659828008
sdh              22.60      1127.12      1424.24 8550776952 10804834600
sdi              12.66      1176.71      1043.97 8926929272 7919898000
sdj              13.50      1140.80       912.27 8654489384 6920787296
sdk              13.14      1314.36      1041.48 9971220872 7901060992
sdl              11.16       674.53       366.49 5117257008 2780306920
sdm              10.82       829.36       604.99 6291851320 4589685592
dm-0              2.82        24.81         9.80  188208594   74373432
dm-1              0.00         0.00         0.00        680          0
dm-2              2.52        50.08         5.81  379928338   44067760
dm-3              8.48        40.40        27.46  306454472  208332272
dm-4            364.33     15591.41     11799.05 118282051176 89511839936

As you can see, instead of the I/O being spread evenly among the disks/PVs, the vast majority of it has been concentrated on a single disk. What would cause this?

Some more info on the system:

It's running CentOS 5.5, with kernel 2.6.18-194.17.1.el5

[root@lolno BackupPC]# xfs_info /data 
meta-data=/dev/mapper/backup_vg1-BackupLV isize=256    agcount=66, agsize=268435455 blks 
         =                       sectsz=4096  attr=0 
data     =                       bsize=4096   blocks=17581608960, imaxpct=25 
         =                       sunit=0      swidth=0 blks, unwritten=1 
naming   =version 2              bsize=4096   
log      =internal               bsize=4096   blocks=32768, version=2 
         =                       sectsz=4096  sunit=1 blks, lazy-count=0 
realtime =none                   extsz=4096   blocks=0, rtextents=0 


[root@lolno BackupPC]# lvdisplay -v /dev/backup_vg1/BackupLV
        Using logical volume(s) on command line
    --- Logical volume ---
    LV Name                /dev/backup_vg1/BackupLV
    VG Name                backup_vg1
    LV UUID                L8i09U-lVxh-1ETM-mNRQ-j3es-uCSI-M1xz45
    LV Write Access        read/write
    LV Status              available
    # open                 1
    LV Size                65.50 TB
    Current LE             17169540
    Segments               12
    Allocation             inherit
    Read ahead sectors     auto
    - currently set to     256
    Block device           253:4

My first thought was that this has something to do with a lack of striping in xfs, but according to http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/xfsmain.html#xfscreating

"When creating filesystems on lvm or md volumes, mkfs.xfs chooses an optimal geometry."

So is that just not happening here, or is there something else going on?

score 1 · Answer 1 · answered May 15 '13 at 11:59

From agcount=66 you can see you have 66 Allocation Groups (so 66 potential IO threads) but only 12 physical block devices.

XFS will try to put each new directory in a different AG, so if you're doing a lot of IO to the same directory, you may be doing single-threaded IO to the one AG, which is stored on the one block device.

It's also feasible that even if you're doing IO to different AGs, several of those 66 AGs are on the same block device. 66/12=5.5 so you could have up to 5 IO threads writing data for 5 AGs to the one underlying block device.

From sunit=0 swidth=0 you can see the filesystem is not aware of the underlying RAID array at all.

I think your filesystem has been made incorrectly. mkfs.xfs is not really that smart.

Have a read of the XFS documentation, learn how the fileystem is structured and how your existing data is likely to end up spread across those structures. It's a surprisingly easy filesystem to understand.

You're in a good position here because you actually have data to look at, you're not working with some imaginary specification from the app developers which will change over time.

Re-make your filesystem to better suit your data, block devices, and RAID layout. In particular the "How to calculate the correct sunit,swidth values for optimal performance" question in the FAQ will be useful to you, though that's definitely not the only thing you should pay attention to:

Uneven utilization of disks in xfs-formatted logical volume

1 Answers1