Does Cassandra rebalance the data after inserting additional disks?

Question

I'm running a Cassandra cluster with 5 nodes, each having 10 1Tb disks (JBOD). Currently, one of the nodes is in the problematic situation where a large compaction can no longer succesfully complete due to running out of disk space on a single disk.

I am trying to figure out what the effect will be of adding additional disks to the JBOD configuration.

Will the existing data be redistributed automatically to utilize the new disk optimally?
Will only new data be written to the newly added disks?
Can i manually move sstables to different disks?
Is splitting the sstables an option?

I found sources online which are not fully conclusive:

https://stackoverflow.com/questions/23110054/cassandra-adding-disks-increase-storage-volume-without-adding-new-nodes seems to suggests the "data will even out between disks over time", but doesn't specify if that's due to rebalancing or the fact that new data will be written to the new disk only (also old link, so not sure if still relevant).
http://mail-archives.apache.org/mod_mbox/cassandra-user/201610.mbox/%3cCAMy13tA3cZ++LaVnUsuwkwbR5tvBdhMEOqWij9nrWRODq42rLQ@mail.gmail.com%3e seems to imply that compactions will always run data disk local with Cassandra 3.2+.

score 0 · Accepted Answer · answered Apr 09 '21 at 06:15

0

Writing to new disks will happen for new data, and during compaction. Actual logic depends on the Cassandra version, like, newer version are putting specific partition ranges to specific disks. Usual recommendation is to use RAID-0 to have one big disk, so you don't have problem with big SSTables. But this method has one disadvantage - you lose all data and need to rebuild everything if you lose one disk.

Theoretically you can move some SSTables to other disks manually (given that node is stopped), and then Cassandra will reallocate data during compaction, but I haven't tried it for this configuration. There is another problem with that action - if you move some SSTables and it has deleted or updated data that is shadowed by newer data on another disk, and if this disk crashes, then you can get deleted/old data recovered.

P.S. 10Tb per node is crazy. Just think how much time it will require to rebuild a single node if server is broken.

answered Apr 09 '21 at 06:15

Alex Ott

316
1
5

Thanks for the reply! "Writing to new disks will happen for new data, and during compaction. Actual logic depends on the Cassandra version". So, combined with http://www.datastax.com/dev/blog/improving-jbod , this means compaction will always take place on the local (problematic) disk for Cassandra 3.2+ ? – RikJ Apr 09 '21 at 15:01
Regarding the capacity of 10 TB, most data is not hot so latency shouldn't suffer to much according to https://community.datastax.com/questions/1902/what-is-the-data-size-per-node-supported-in-apache.html . Rebuilding could be problematic though. – RikJ Apr 09 '21 at 15:02
Moving the SSTables seems like a risky operation at the moment. My current strategy will be scaling vertically (extra disks), removing the problematic node, wiping it, and re-add it to the config. – RikJ Apr 09 '21 at 15:03
when you add the disk, Cassandra should recalculate ranges allocated for disks, so when compaction happens, then there is a chance that part of the data will be moved to other disks – Alex Ott Apr 09 '21 at 15:31

Does Cassandra rebalance the data after inserting additional disks?

1 Answers1