Got a DSC 2.1.15 14-node cluster, which is using STCS and it seems to be hovering around what seems a stable number of sstables even as we insert more and more data, so currently starting to see sstables data files in the excess of +1TB. See graphs:
Reading this we fear that having too large file sizes, might postpone compacting tombstones to finally release space as we'll have to wait for at least 4 similar sized sstables to get created.
Every node currently have two data directories each, we were hoping cassandra would spread data across those dirs using space relative equally, but as sstables are growing due to compaction, We fear ending with larger and larger sstables and maybe in one data dir primarily.
Howto possible control this better maybe, LCS or...?
Howto determine a sweet spot for number of sstables vs their sizes?
What affects the number of sstables vs their sizes vs in what data dir they get placed?
Currently few nodes are beginning to look skewed:
/dev/mapper/vg--blob1-lv--blob1 6.4T 3.3T 3.1T 52% /blob/1
/dev/mapper/vg--blob2-lv--blob2 6.6T 545G 6.1T 9% /blob/2
Could we stop a node, merge all keyspace's sstables (they seem uniquely named with an id/seq.# even though spread in two data dirs) into one data dir and expand the underlying volume and restart the node again and thus avoid running out of 'space' when only one data dir FS gets filled?