3

I'm doing something like this and formatting the filesystem is taking a long time so I'm guessing I have something wrong.

sda is spinning and sdb, sdc and sdd are SSD.

clearpart --drives=sda,sdb,sdc,sdd --initlabel
part swap --recommended --ondisk=sda
part / --size=4096 --grow --ondisk=sda
part pv.11 --size=100 --grow --ondisk=sdb
part pv.12 --size=100 --grow --ondisk=sdc
part pv.13 --size=100 --grow --ondisk=sdd
volgroup datavg --pesize=4096 pv.11 pv.12 pv.13
logvol /data --fstype=ext4 --name=datalv --vgname=datavg --size=100 --grow --percent=100 raid.11 raid.12 raid.13

Questions:

  1. Should I be passing ext4 options?
  2. Is my stripe of 4M correct?
  3. Should I be adjusting anything else in LVM?

The goal:

  1. stripe three SSD disks together for performance under a Cassandra load.
  2. Support TRIM
  3. use the drives correctly to extend lifespan

Thanks!

Alan

alan laird
  • 41
  • 1
  • 6

2 Answers2

1

If by 'stripe' you mean your PE size, we are talking about two different things here. A physical extent is just the size of a chunk of storage you can extend your logical volume with. A stripe is a chunk of storage that is divided over several storage devices. A physical extent has no such requirement.

What I think would happen, would be for your LV to span all three disks, but in a way that will fill up disk sdb first, then sdc and finally sdd.

I'm not sure LVM is the most logical way to do this. Personally, I'd go for a software RAID set with mdadm. That would make an actual RAID set and not a LV that spans three disks.

Apart from that: there is no way to tell whether your stripe of 4MiB is the right size without more information. The optimal size of a stripe is determined by the size of the average request size, which is the average size of a read request from the disk set. What you need to do, is run a typical load on that machine and use SAR or something similar to measure your average request size. Then, you reinstall the machine (or just reformat the RAID set) and use that average request size divided by two(!) a as the size of a data chunk. The third disk will hold your parity blocks (assuming you are building a RAID 5 set). Your stripe size will then be (avg req size / 2) * 3, with 2 data chunks and a parity chunk, all the size of (avg req size / 2).

I hope I'm saying this right :) I'm getting tired. I'll go over this again tomorrow to see if I make any sense :)

Usualy, the average request size would be in the order of kilobytes, not megabytes, but there is no way of knowing this for your specific situation without measuring.

That said, I have little experience with using LVM for RAID-like schemes.

wzzrd
  • 10,409
  • 2
  • 35
  • 47
1

I ended up doing something like this:

# Partition clearing information
clearpart --drives=sda,sdb,sdc,sdd --initlabel
part swap --recommended --ondisk=sda
part / --size=4096 --grow --ondisk=sda
# create the ssd partitions
part pv.11 --size=100 --grow --ondisk=sdb
part pv.12 --size=100 --grow --ondisk=sdc
part pv.13 --size=100 --grow --ondisk=sdd

%post
vgcreate cassvg /dev/sdb1 /dev/sdc1 /dev/sdd1
lvcreate -l 100%FREE --stripes 3 --stripesize 4096 --name casslv cassvg
mkfs.ext4 -b 4096 -E stride=128,stripe-width=128 /dev/cassvg/casslv
mkdir -p /cache/cassandra/storage
echo "/dev/mapper/cassvg-casslv     /cache/cassandra/storage ext4   discard, noatime,nodiratime 1 2" >> /etc/fstab

I moved the volume and logical creation to the %post section to avoid an issue with the newish initramfs kickstart environment in centos6.2. Creating the volume and logical in the %post step creates them with the correct hostname so they mount correctly at runtime.

There are sophisticated ways to deal with the initramfs issue but it seemed like a rabbit-hole when I could just move them to %post for this application data volume.

alan laird
  • 41
  • 1
  • 6