I have 4 disks available on my virtual machine for testing sdb
, sdc
, sdd
, and sde
.
The first 3 disks are used for a RAID5 configuration, the last disk is used as lvm cache drive.
What I don't understand is the following:
When I create a cache disk of 50GB with a chunk size of 64KiB, xfs_info
gives me the following:
[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777072 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=536866304, imaxpct=5
= sunit=16 swidth=32 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=262144, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
As we can see here the sunit=16 and the swidth=32 seems to be correct and matching the raid5 layout.
The results of lsblk -t
[vagrant@node-02 ~]$ lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 0 512 512 1 deadline 128 4096 0B
├─sda1 0 512 0 512 512 1 deadline 128 4096 0B
└─sda2 0 512 0 512 512 1 deadline 128 4096 0B
├─centos-root 0 512 0 512 512 1 128 4096 0B
├─centos-swap 0 512 0 512 512 1 128 4096 0B
└─centos-home 0 512 0 512 512 1 128 4096 0B
sdb 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdc 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdd 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sde 0 512 0 512 512 1 deadline 128 4096 32M
sdf 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdg 0 512 0 512 512 1 deadline 128 4096 32M
sdh 0 512 0 512 512 1 deadline 128 4096 32M
And lvdisplay -a -m data
gives me the following:
[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
--- Logical volume ---
LV Path /dev/data/data
LV Name data
VG Name data
LV UUID MBG1p8-beQj-TNDd-Cyx4-QkyN-vdVk-dG6n6I
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Cache pool name cache_data
LV Cache origin name data_corig
LV Status available
# open 1
LV Size <2.00 TiB
Cache used blocks 0.06%
Cache metadata blocks 0.64%
Cache dirty blocks 0.00%
Cache read hits/misses 293 / 66
Cache wrt hits/misses 59 / 41173
Cache demotions 0
Cache promotions 486
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:9
--- Segments ---
Logical extents 0 to 524283:
Type cache
Chunk size 64.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data
VG Name data
LV UUID apACl6-DtfZ-TURM-vxjD-UhxF-tthY-uSYRGq
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
LV Pool metadata cache_data_cmeta
LV Pool data cache_data_cdata
LV Status NOT available
LV Size 50.00 GiB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
--- Segments ---
Logical extents 0 to 12799:
Type cache-pool
Chunk size 64.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data_cmeta
VG Name data
LV UUID hmkW6M-CKGO-CTUP-rR4v-KnWn-DbBZ-pJeEA2
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:15 +0000
LV Status available
# open 1
LV Size 1.00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:11
--- Segments ---
Logical extents 0 to 255:
Type linear
Physical volume /dev/sdf
Physical extents 0 to 255
--- Logical volume ---
Internal LV Name cache_data_cdata
VG Name data
LV UUID 9mHe8J-SRiY-l1gl-TO1h-2uCC-Hi10-UpeEVP
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
LV Status available
# open 1
LV Size 50.00 GiB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:10
--- Segments ---
Logical extents 0 to 12799:
Type linear
Physical volume /dev/sdf
Physical extents 256 to 13055
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
[vagrant@node-02 ~]$
[vagrant@node-02 ~]$ --- Segments ---
Df7SLj
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Status available
# open 1
LV Size 1023.99 GiB
Current LE 262142
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:8
--- Segments ---
Logical extents 0 to 262141:
Type linear
Physical volume /dev/sdd
Physical extents 1 to 262142
--- Logical volume ---
Internal LV Name data_corig_rmeta_2
VG Name data
LV UUID xi9Ot3-aTnp-bA3z-YL0x-eVaB-87EP-JSM3eN
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Status available
# open 1
LV Size 4.00 MiB
Current LE 1
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:7
--- Segments ---
Logical extents 0 to 0:
Type linear
Physical volume /dev/sdd
Physical extents 0 to 0
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
We can clearly see the chunk size of 64KiB in the segments.
But when I create a cache disk of 250GB lvm needs at least a chunk size of 288KiB for that cache disk to accommodate the size. But when I execute xfs_info
the sunit/swidth
values suddenly match that of the cache drive instead of the RAID5 layout.
Output xfs_info
[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777152 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=536866816, imaxpct=5
= sunit=72 swidth=72 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=262144, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Suddenly we have a sunit
and swidth
of 72 which match the chunk size of 288KiB of the cache drive, we can see this with lvdisplay -m -a
[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
--- Logical volume ---
LV Path /dev/data/data
LV Name data
VG Name data
LV UUID XLHw3w-RkG9-UNh6-WZBM-HtjM-KcV6-6dOdnG
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:32 +0000
LV Cache pool name cache_data
LV Cache origin name data_corig
LV Status available
# open 1
LV Size <2.00 TiB
Cache used blocks 0.17%
Cache metadata blocks 0.71%
Cache dirty blocks 0.00%
Cache read hits/misses 202 / 59
Cache wrt hits/misses 8939 / 34110
Cache demotions 0
Cache promotions 1526
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:9
--- Segments ---
Logical extents 0 to 524283:
Type cache
Chunk size 288.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data
VG Name data
LV UUID Ps7Z1P-y5Ae-ju80-SZjc-yB6S-YBtx-SWL9vO
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:40 +0000
LV Pool metadata cache_data_cmeta
LV Pool data cache_data_cdata
LV Status NOT available
LV Size 250.00 GiB
Current LE 64000
Segments 1
Allocation inherit
Read ahead sectors auto
--- Segments ---
Logical extents 0 to 63999:
Type cache-pool
Chunk size 288.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data_cmeta
VG Name data
LV UUID k4rVn9-lPJm-2Vvt-77jw-NP1K-PTOs-zFy2ph
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
LV Status available
# open 1
LV Size 1.00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:11
--- Segments ---
Logical extents 0 to 255:
Type linear
Physical volume /dev/sdf
Physical extents 0 to 255
--- Logical volume ---
Internal LV Name cache_data_cdata
VG Name data
LV UUID dm571W-f9eX-aFMA-SrPC-PYdd-zs45-ypLksd
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
LV Status available
# open 1
LV Size 250.00 GiB
Current LE 64000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:10
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID hbYiRO-YnV8-gd1B-shQD-N3SR-xpTl-rOjX8V
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:41 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
And the output of lsblk -t
[vagrant@node-02 ~]$ lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 0 512 512 1 deadline 128 4096 0B
├─sda1 0 512 0 512 512 1 deadline 128 4096 0B
└─sda2 0 512 0 512 512 1 deadline 128 4096 0B
├─centos-root 0 512 0 512 512 1 128 4096 0B
├─centos-swap 0 512 0 512 512 1 128 4096 0B
└─centos-home 0 512 0 512 512 1 128 4096 0B
sdb 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdc 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdd 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sde 0 512 0 512 512 1 deadline 128 4096 32M
sdf 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdg 0 512 0 512 512 1 deadline 128 4096 32M
sdh 0 512 0 512 512 1 deadline 128 4096 32M
A few questions arise here.
XFS Autodetect these settings apparently, but why does XFS chooses to use the chunk size of the cache drive? It is able to autodetect the RAID5 layout as we could see in the first example.
I know that I can pass the su/sw
options to mkfs.xfs
to get the correct sunit/swidth
values, but should I do this in this case??
I googled for days now, I looked in the XFS source code but I wasn't able to find any clue why XFS does this.
So the questions that arises:
- Why does XFS behave like this?
- Should I define the
su/sw
manually when running themkfs.xfs
- Does the chunk size of the cache drive have influence on the RAID5 setup, and should this be aligned somehow?