I have 3 1 TB HDDs and 3 500 GB HDDs. Right now each size grouping is in a RAID 5, both of which are in an LVM volume group (with striped LVs).
I'm finding this to be too slow for my usage on small random writes. I've fiddled with stripe sizes both on the RAID level and on the LVM stripe level, as well as stripe cache and readahead buffer size increases. I've also disabled NCQ as per the usual advice.
So I am done with Linux software raid 5. Without a dedicated controller, it's not useful for my purposes.
I am adding another 1 TB drive and another 500 GB drive so 4 of each.
How would you configure the eight drives to get the best small random write performance? Excluding simple RAID 0 of course, as the point of this set up is obviously also for redundancy. I have considered putting the 4 500 GB disks into 2 RAID 0s and then adding that to a RAID 10 of the other 4 1 TB HDs, for a 6 disk RAID 10 but I am not sure that this is the best solution. What say you?
Edit: There is no more budget for hardware upgrades. What I am really asking is, insofar as the four 1 TB drives can be RAID 10 pretty straightforwardly, what do I do with the four 500 GB drives such that they fit best with the 4x1TB RAID 10 without becoming a redundancy or performance problem? The other idea I had was to RAID 10 all four 500 GB drives together and then use LVM to add that capacity in with the 4x1TB RAID10. Is there anything better you can think of?
Another Edit: The existing array is formatted as follows:
1 TB ext4 formatted lvm striped file share. Shared to two Macs via AFP.
1 500 GB lvm logical volume exported via iscsi to a Mac, formatted as HFS+. Used a Time Machine backup.
1 260 GB lvm logical volume exported via iscsi to a Mac, formatted as HFS+. Used as a Time Machine backup.
1 200 GB ext4 formatted lvm partition, used a disk device for a virtualised OS installtion.
An lvm snapshot of the 500 GB time machine backup.
One thing that I haven't tried is replacing the Time Machine LVs with a file on the ext4 filesystem (so that the iscsi mount points to the file instead of a block device). I have a feeling that will solve my speed issues, but it will prevent me from being able to take snapshots of those partitions. So I am not sure it's worth the trade off.
In the future I intend to move an iPhoto and iTunes library on to the server on another HFS+ iscsi mount, the testing of which is how I began to notice the inane random write performance.
If you're curious, I used the info in the Raid Math section of this url: http://wiki.centos.org/HowTos/Disk_Optimization to figure out how to set everything up for the ext4 partition (and as a result I'm seeing excellent performance on it) however this doesn't seem to have done any good for the iscsi shared HFS+ volumes.
A lot more detail:
output of lvdisplay:
--- Logical volume ---
LV Name /dev/array/data
VG Name array
LV UUID 2Lgn1O-q1eA-E1dj-1Nfn-JS2q-lqRR-uEqzom
LV Write Access read/write
LV Status available
# open 1
LV Size 1.00 TiB
Current LE 262144
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 2048
Block device 251:0
--- Logical volume ---
LV Name /dev/array/etm
VG Name array
LV UUID KSwnPb-B38S-Lu2h-sRTS-MG3T-miU2-LfCBU2
LV Write Access read/write
LV snapshot status source of
/dev/array/etm-snapshot [active]
LV Status available
# open 1
LV Size 500.00 GiB
Current LE 128000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 2048
Block device 251:1
--- Logical volume ---
LV Name /dev/array/jtm
VG Name array
LV UUID wZAK5S-CseH-FtBo-5Fuf-J3le-fVed-WzjpOo
LV Write Access read/write
LV Status available
# open 1
LV Size 260.00 GiB
Current LE 66560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 2048
Block device 251:2
--- Logical volume ---
LV Name /dev/array/mappingvm
VG Name array
LV UUID 69k2D7-XivP-Zf4o-3SVg-QAbD-jP9W-cG8foD
LV Write Access read/write
LV Status available
# open 0
LV Size 200.00 GiB
Current LE 51200
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 2048
Block device 251:3
--- Logical volume ---
LV Name /dev/array/etm-snapshot
VG Name array
LV UUID 92x9Eo-yFTY-90ib-M0gA-icFP-5kC6-gd25zW
LV Write Access read/write
LV snapshot status active destination for /dev/array/etm
LV Status available
# open 0
LV Size 500.00 GiB
Current LE 128000
COW-table size 500.00 GiB
COW-table LE 128000
Allocated to snapshot 44.89%
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 2048
Block device 251:7
output of pvs --align -o pv_name,pe_start,stripe_size,stripes
PV 1st PE Stripe #Str
/dev/md0 192.00k 0 1
/dev/md0 192.00k 0 1
/dev/md0 192.00k 0 1
/dev/md0 192.00k 0 1
/dev/md0 192.00k 0 0
/dev/md11 512.00k 256.00k 2
/dev/md11 512.00k 256.00k 2
/dev/md11 512.00k 256.00k 2
/dev/md11 512.00k 0 1
/dev/md11 512.00k 0 1
/dev/md11 512.00k 0 0
/dev/md12 512.00k 256.00k 2
/dev/md12 512.00k 256.00k 2
/dev/md12 512.00k 256.00k 2
/dev/md12 512.00k 0 0
output of cat /proc/mdstat
md12 : active raid5 sdc1[1] sde1[0] sdh1[2]
976770560 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]
md11 : active raid5 sdg1[2] sdf1[0] sdd1[1]
1953521152 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]
output of vgdisplay:
--- Volume group ---
VG Name array
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 8
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 3
Max PV 0
Cur PV 2
Act PV 2
VG Size 2.73 TiB
PE Size 4.00 MiB
Total PE 715402
Alloc PE / Size 635904 / 2.43 TiB
Free PE / Size 79498 / 310.54 GiB
VG UUID PGE6Oz-jh96-B0Qc-zN9e-LKKX-TK6y-6olGJl
output of dumpe2fs /dev/array/data | head -n 100 (or so)
dumpe2fs 1.41.12 (17-May-2010)
Filesystem volume name: <none>
Last mounted on: /mnt/array/data
Filesystem UUID: b03e8fbb-19e5-479e-a62a-0dca0d1ba567
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 67108864
Block count: 268435456
Reserved block count: 13421772
Free blocks: 113399226
Free inodes: 67046222
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 960
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 128
RAID stripe width: 128
Flex block group size: 16
Filesystem created: Thu Jul 29 22:51:26 2010
Last mount time: Sun Oct 31 14:26:40 2010
Last write time: Sun Oct 31 14:26:40 2010
Mount count: 1
Maximum mount count: 22
Last checked: Sun Oct 31 14:10:06 2010
Check interval: 15552000 (6 months)
Next check after: Fri Apr 29 14:10:06 2011
Lifetime writes: 677 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 9e6a9db2-c179-495a-bd1a-49dfb57e4020
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x000059af
Journal start: 1
output of lvs array --aligned -o seg_all,lv_all
Type #Str Stripe Stripe Region Region Chunk Chunk Start Start SSize Seg Tags PE Ranges Devices LV UUID LV Attr Maj Min Rahead KMaj KMin KRahead LSize #Seg Origin OSize Snap% Copy% Move Convert LV Tags Log Modules
striped 2 256.00k 256.00k 0 0 0 0 0 0 1.00t /dev/md11:0-131071 /dev/md12:0-131071 /dev/md11(0),/dev/md12(0) 2Lgn1O-q1eA-E1dj-1Nfn-JS2q-lqRR-uEqzom data -wi-ao -1 -1 auto 251 0 1.00m 1.00t 1 0
striped 2 256.00k 256.00k 0 0 0 0 0 0 500.00g /dev/md11:131072-195071 /dev/md12:131072-195071 /dev/md11(131072),/dev/md12(131072) KSwnPb-B38S-Lu2h-sRTS-MG3T-miU2-LfCBU2 etm owi-ao -1 -1 auto 251 1 1.00m 500.00g 1 500.00g snapshot
linear 1 0 0 0 0 4.00k 4.00k 0 0 500.00g /dev/md11:279552-407551 /dev/md11(279552) 92x9Eo-yFTY-90ib-M0gA-icFP-5kC6-gd25zW etm-snapshot swi-a- -1 -1 auto 251 7 1.00m 500.00g 1 etm 500.00g 44.89 snapshot
striped 2 256.00k 256.00k 0 0 0 0 0 0 260.00g /dev/md11:195072-228351 /dev/md12:195072-228351 /dev/md11(195072),/dev/md12(195072) wZAK5S-CseH-FtBo-5Fuf-J3le-fVed-WzjpOo jtm -wi-ao -1 -1 auto 251 2 1.00m 260.00g 1 0
linear 1 0 0 0 0 0 0 0 0 200.00g /dev/md11:228352-279551 /dev/md11(228352) 69k2D7-XivP-Zf4o-3SVg-QAbD-jP9W-cG8foD mappingvm -wi-a- -1 -1 auto 251 3 1.00m 200.00g 1 0
cat /sys/block/md11/queue/logical_block_size
512
cat /sys/block/md11/queue/physical_block_size
512
cat /sys/block/md11/queue/optimal_io_size
524288
cat /sys/block/md11/queue/minimum_io_size
262144
cat /sys/block/md12/queue/minimum_io_size
262144
cat /sys/block/md12/queue/optimal_io_size
524288
cat /sys/block/md12/queue/logical_block_size
512
cat /sys/block/md12/queue/physical_block_size
512
Edit: So no one can tell me whether or not there is something wrong here? No concrete advice at all? Hmmm.