0

I am installing a 5 node Cassandra installation and will have 10 SSD 1.9 TB drives available per node. I want to use LVM to combine the 10 drives and distribute the disk space needed for each node.

The only documentation that I can find for Cassandra says JBOD or RAID. Can I use LVM or is that going to cause issues within Cassandara?

This is more of an informational question before I get started, I haven't actually tried anything yet.

Rich Michaels
  • 1,663
  • 2
  • 12
  • 18

1 Answers1

1

Yes, you can use LVM, and we have done so with one of our clusters. If using LVM, be sure the devices are striped instead of linear. If you use linear, the first device will get consumed, then the second, then the third, etc. So, many devices could be sitting idle while others very busy. The downside to using LVM in striped mode is that if you have to modify the configuration (either grow or shrink the LVM size), you can't (i.e. you can't expand a striped volume). We have also used JBOD as well. With JBOD, you'll have directories duplicated on every device and sometimes sstables will reside on one v.s. the other - unpredictable and somewhat "messy". As sstables reside on a device, you don't really get "striping" per say, either. sstables are attempted to be distributed evenly across the devices. Also, as the individual devices are smaller, you could run into a space/compaction problem if there is not enough room on, say, one of the devices to compact the sstables that exist. So for me, personally, I would choose LVM as it's much more clean. I believe you might see some slight overhead with using LVM as I believe LVM may batch up some operations before performing them, but it hasn't seemed significant to me. To me, LVM is a bit less "messy".

-Jim

Jim Wartnick
  • 1,974
  • 1
  • 9
  • 19
  • That really helps but brings up a follow up question. If JBOD is messy and LVM cannot be expanded, what other options are there? Is it better when working with the LVM volumes to make 1 very large volume with all 10 disks or break it up into smaller volumes? Is there a third option that gives me expandability and is not messy? – Heath Curry Jun 10 '19 at 12:15
  • The only other thing I can think of is you do a hybrid approach. Use LVM striping for all of your devices and have one data_directory. If you need to expand, add another striped LVM volume and include it in the data directory (now you have cassandra "striping" between the two data directories and LVM striping the actual devices). – Jim Wartnick Jun 10 '19 at 15:15