0

I'm running a Cassandra cluster with 5 nodes, each having 10 1Tb disks (JBOD). Currently, one of the nodes is in the problematic situation where a large compaction can no longer succesfully complete due to running out of disk space on a single disk.

I am trying to figure out what the effect will be of adding additional disks to the JBOD configuration.

  1. Will the existing data be redistributed automatically to utilize the new disk optimally?
  2. Will only new data be written to the newly added disks?
  3. Can i manually move sstables to different disks?
  4. Is splitting the sstables an option?

I found sources online which are not fully conclusive:

RikJ
  • 3
  • 1

1 Answers1

0

Writing to new disks will happen for new data, and during compaction. Actual logic depends on the Cassandra version, like, newer version are putting specific partition ranges to specific disks. Usual recommendation is to use RAID-0 to have one big disk, so you don't have problem with big SSTables. But this method has one disadvantage - you lose all data and need to rebuild everything if you lose one disk.

Theoretically you can move some SSTables to other disks manually (given that node is stopped), and then Cassandra will reallocate data during compaction, but I haven't tried it for this configuration. There is another problem with that action - if you move some SSTables and it has deleted or updated data that is shadowed by newer data on another disk, and if this disk crashes, then you can get deleted/old data recovered.

P.S. 10Tb per node is crazy. Just think how much time it will require to rebuild a single node if server is broken.

Alex Ott
  • 316
  • 1
  • 5
  • Thanks for the reply! "Writing to new disks will happen for new data, and during compaction. Actual logic depends on the Cassandra version". So, combined with http://www.datastax.com/dev/blog/improving-jbod , this means compaction will always take place on the local (problematic) disk for Cassandra 3.2+ ? – RikJ Apr 09 '21 at 15:01
  • Regarding the capacity of 10 TB, most data is not hot so latency shouldn't suffer to much according to https://community.datastax.com/questions/1902/what-is-the-data-size-per-node-supported-in-apache.html . Rebuilding could be problematic though. – RikJ Apr 09 '21 at 15:02
  • Moving the SSTables seems like a risky operation at the moment. My current strategy will be scaling vertically (extra disks), removing the problematic node, wiping it, and re-add it to the config. – RikJ Apr 09 '21 at 15:03
  • when you add the disk, Cassandra should recalculate ranges allocated for disks, so when compaction happens, then there is a chance that part of the data will be moved to other disks – Alex Ott Apr 09 '21 at 15:31