Split RAID array at controller, or at LVM?

Question

I have a CentOS box with 10 2TB drives & an LSI RAID controller, used as an NFS server.

I know I'm going to use RAID 1 to create 5TB of usable space. But in terms of performance, reliability & management, which is better, create a single 5TB array on the controller, or create 5 1TB arrays and use LVM to regroup them into one (or more) VGs.

I'm particularly interested in hearing why you would pick one approach or the other.

Thanks!

Strictly speaking, you can't make "a single 5TB array" using RAID1 from 10 physical 2TB disks. If you use 5 2TB disks to build a RAID1, the array will be only 2TB large, but survive up to 4 failing disks. To make "a single 5TB array" you'll have to use either RAID 0+1 or RAID 1+0. — earl, Jul 21 '10 at 22:15
You're right, of course. Interestingly, the RAID software will let me create a thing it calls a 5TB RAID 1 array. I wonder what it actually is? — Jeff Leyser, Jul 21 '10 at 22:32
Late to the party, but you could also go for an 8 TB RAID6 array. Random access performance will be (a lot) worse, sequential read performance will be roughly the same, and redundancy is slightly better (because a single disk failure doesn't mean that part of your data is no longer redundant). — Simon Richter, Dec 14 '21 at 14:42

score 2 · Accepted Answer · answered Jul 21 '10 at 22:53

If the controller will allow you to provision a 10-disk raid 10 (rather than 2 8-disk units with 2 disks left over) that would probably be the best bet. It's simple to manage, you get good write performance with battery backed cache and the RAID card does all the heavy lifting, monitoring, management. Just install the RAID card's agent in the OS so you can reconfigure and monitor status from within the OS and you should be set.

Putting everything in the care of the RAID card makes the quality of the software on the card the most important factor. I have had RAID cards which have crashed causing the whole IO subsystem to "go away" and requiring a server reboot, I've even had instances of a card completely losing the array configuration requiring either it to be carefully reconfigured from the console or the whole thing to be restored from backups. The chances that you, with your one server, would see any particular problem are low, but if you had hundreds or thousands of servers you would probably see these kinds of problems periodically. Maybe newer hardware is better, I haven't had these kinds of problems in a while.

On the other hand it is possible and even probable that the IO scheduling in Linux is better than what's on the RAID card so either presenting each disk individually or as 5 RAID 1 units and using LVM to stripe across them might give the best read performance. Battery backed write cache is critical for good write performance though so I wouldn't suggest any configuration that doesn't have that feature. Even if you can present the disks as a JBOD and have battery backed write cache enabled at the same time there is additional management overhead and complexity to using Linux software raid and smartd hardware monitoring. It's easy enough to get set up but you need to work through the procedure to handle drive failures, including the boot drive. It's not as simple as pop out the disk with the yellow blinky light and replace. Extra complexity can create room for error.

So I recommend a 10-disk RAID 10 if your controller can do it or 5 RAID 1s with LVM striping if it can't. If you test out your hardware and find that JBOD and Linux RAID works better than use that but you should specifically test for good random write performance across a large portion of the disk using something like sysbench rather than just sequential reads using dd.

The one thing I'd recommend checking with this approach is that it's actually RAID 10, and not 0+1. The only difference is if two drives fail -- in the case of RAID 0+1 any subsequent drive failures EXCEPT the other half of the failed mirror will fail the array, whereas in RAID 10 any other drive is fine to lost EXCEPT the other half of the failed mirror. The easiest way to test is to fail a drive, and then see pull out and plug back in other drives, noting failures of the entire array as you go. Make sense? I *have* seen commercial arrays that do RAID 0+1. — Jeff McJunkin, Jul 21 '10 at 23:01
As an aside that card is very likely to not pass any SMART data to the OS. — Banis, Jul 22 '10 at 01:40

score 0 · Answer 2 · answered Jul 21 '10 at 22:06

0

That's actually R10, not R1 - and it's R10 I'd use, i.e. let the OS see all ten raw disks and manage it 100% in software., anything else is needlessly over complex.

answered Jul 21 '10 at 22:06

Chopper3

101,299
9
108
239

The upside is you get to use the neat Linux RAID 10 implementation but the downside is that you lose the write performance benefit of battery backed cache. There is also an upside in that the Linux IO scheduler keeps separate queues for each disk so will work most efficiently when the number of disks isn't abstracted away but the downside is that management and monitoring of the array and hardware from the OS is more complex as its not abstracted away by the RAID controller. – mtinberg Jul 21 '10 at 22:31
But isn't that an argument against all hardware RAID? Why not do the RAID 10 at the controller level? – Jeff Leyser Jul 21 '10 at 22:33
1

Don't get me wrong, I do R10 in hardware all the time, but it does tie your array into one manufacturer. Use LVM and the array can move from controller to controller as needed. – Chopper3 Jul 21 '10 at 22:47

score 0 · Answer 3 · answered Jul 21 '10 at 22:10

0

If you're stuck with 2TB LUNs due to 32-bittedness somewhere, I'd strongly lean towards making 5x 1TB RAID1 LUNs on the RAID card and throwing them into a volume-group to make one big 5TB hunk o' space. That way the card handles the write multiplication implicit in the RAID1 relationship, and you get 5TB of space.

If you can make LUNs larger than 2TB, I lean towards making that one big array on the RAID card. The strength of my lean depends A LOT on the capabilities of the RAID card in question. I don't know what it is, so I can't advise you. If I didn't trust it, I'd stick with the 5x 1TB RAID1 arrangement.

answered Jul 21 '10 at 22:10

sysadmin1138

133,124
18
176
300

64bit OS, so LUN size is not a problem. "Trust" the RAID array in what sense? It's an LSI MegeRAID 8888ELP. – Jeff Leyser Jul 21 '10 at 22:12
1

It's not about the OS, the question was whether the RAID card is capable of presenting LUNs greater than 2GB, not all versions of the LSI can. For example I have a Dell PE2950 with LSI RAID and I have to have one 2TB LUN and one 1.5TB LUN that I concatenate with LVM because the controller can't present the full size of the RAID disk. – mtinberg Jul 21 '10 at 22:28
With a RAID card that fancy, you should be OK there. – sysadmin1138 Jul 21 '10 at 22:49

score 0 · Answer 4 · answered Jul 21 '10 at 22:39

I'd suggest using the expensive raid controller to do the bulk of the raid work. LSI cards and the software they come with works quite nicely. When properly configured, they will send you email when intereting things happen to the array. Like when disks fail. There is nothing wrong with either of the two linux software raid options, but you've gone out and purchased a somewhat fancy raid card. Let it do the work.

Configure the disk array to expose one big device to Linux. If you would like to break up the final device into small volumes use lvm for that. One big physical volume, one big volume group and cut the volume group into whatever number of logical volumes you need.

Split RAID array at controller, or at LVM?

4 Answers4