0

I am running cqlsh version 5.0.1, having a 6 node cluster, where recently I have done a major data cleanup in a table that uses sizeTieredCompaction strategy in order to free some disk space but that didn't happen, the issue that I am facing is that a lot of tombstones have created and my sstables are of uneven size and hence they are not getting compacted also I tried to compact sstables manually using nodetool compact --user-defined option but my nodetool version is 3.0.15 which does not support it, Is there some other way to compact sstables as I don't want to run the compaction on the complete table as that could be very unpredictable, below I have attached my sstables and cfstats.

sstables:

-rw-r--r-- 1 cassandra cassandra 523G Jun 17  2021 mc-153814-big-Data.db
-rw-r--r-- 1 cassandra cassandra 178G Sep  8 11:19 mc-223618-big-Data.db
-rw-r--r-- 1 cassandra cassandra 370M Oct 20 00:05 mc-259673-big-Data.db
-rw-r--r-- 1 cassandra cassandra 181G Dec  7 00:58 mc-308912-big-Data.db
-rw-r--r-- 1 cassandra cassandra  47G Dec 23 23:29 mc-331310-big-Data.db
-rw-r--r-- 1 cassandra cassandra  13G Dec 27 21:46 mc-335805-big-Data.db
-rw-r--r-- 1 cassandra cassandra  13G Dec 31 18:30 mc-340584-big-Data.db
-rw-r--r-- 1 cassandra cassandra 3.3G Jan  1 19:12 mc-341882-big-Data.db
-rw-r--r-- 1 cassandra cassandra 3.2G Jan  2 21:18 mc-343095-big-Data.db
-rw-r--r-- 1 cassandra cassandra 828M Jan  3 04:25 mc-343352-big-Data.db
-rw-r--r-- 1 cassandra cassandra  58M Jan  3 04:54 mc-343377-big-Data.db
-rw-r--r-- 1 cassandra cassandra  55M Jan  3 05:21 mc-343394-big-Data.db
-rw-r--r-- 1 cassandra cassandra  18M Jan  3 05:29 mc-343399-big-Data.db
-rw-r--r-- 1 cassandra cassandra 4.7M Jan  3 05:30 mc-343400-big-Data.db
-rw-r--r-- 1 cassandra cassandra 5.7M Jan  3 05:33 mc-343401-big-Data.db
-rw-r--r-- 1 cassandra cassandra 230G Dec 24  2020 mc-36042-big-Data.db
-rw-r--r-- 1 cassandra cassandra 380G Jan  4  2021 mc-49122-big-Data.db
-rw-r--r-- 1 cassandra cassandra 8.1G Jan  6  2021 mc-53514-big-Data.db
-rw-r--r-- 1 cassandra cassandra  82G Jan 10  2021 mc-55238-big-Data.db
-rw-r--r-- 1 cassandra cassandra 5.7G Jan 15  2021 mc-56742-big-Data.db

cfstats:

Keyspace: events
    Read Count: 26115727
    Read Latency: 7.895873181627301 ms.
    Write Count: 510188706
    Write Latency: 0.17134826153129307 ms.
    Pending Flushes: 0
        Table: event_track
        SSTable count: 20
        Space used (live): 1.65 TB
        Space used (total): 1.65 TB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 2.09 GB
        SSTable Compression Ratio: 0.13917185434273555
        Number of partitions (estimate): 252702273
        Memtable cell count: 19390
        Memtable data size: 32.35 MB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 24677
        Local read count: 26115728
        Local read latency: 7.580 ms
        Local write count: 510188708
        Local write latency: 0.151 ms
        Pending flushes: 0
        Bloom filter false positives: 333
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 397.03 MB
        Bloom filter off heap memory used: 397.03 MB
        Index summary off heap memory used: 144.78 MB
        Compression metadata off heap memory used: 1.56 GB
        Compacted partition minimum bytes: 51 bytes
        Compacted partition maximum bytes: 307.45 MB
        Compacted partition mean bytes: 44.67 KB
        Average live cells per slice (last five minutes): 11.18867924528302
        Maximum live cells per slice (last five minutes): 372
        Average tombstones per slice (last five minutes): 10.617424242424242
        Maximum tombstones per slice (last five minutes): 1109

Please suggest something using which I can free some disk space as data is growing day by day and it is not possible to increase the disk.

Yash Tandon
  • 345
  • 5
  • 18

1 Answers1

1

You can still perform user-defined compaction, although it should be done via JMX for your version. Full instructions could be found in this great blog post from The Last Pickle, the short version is:

  • get some JMX tool, for example, jmxterm
  • run forceUserDefinedCompaction function of bean org.apache.cassandra.db:type=CompactionManager, passing file name(s) as parameters (use full paths if necessary):
run -b org.apache.cassandra.db:type=CompactionManager forceUserDefinedCompaction mc-341882-big-Data.db,mc-343401-big-Data.db
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Will definitely try this, also is it ok if we reduce the number of min_threshold to 2 from default 4 as I have many sstable pairs having same size but it kind of takes much time to reach upto 4 for large sstables, I am using this command: ALTER TABLE event_track WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 2}; – Yash Tandon Jan 03 '22 at 07:34
  • Yes, it’s ok to decrease min threshold. But this may kickoff multiple compactions – Alex Ott Jan 03 '22 at 07:44