2

I'm trying to free some disk space in C*.
I've deleted many rows which created many tombstones.
I'm running nodetool garbagecollect and was wondering what this tool is doing behind the scens. I've read that it deletes the actual data that the tombstone is shadowing but not the tombstones (which will be cleared after gc_grace_seconds). Is that accurate? the garbagecollect tool does not have any correlation with the gc_grace_seconds parameter? How does the garbagecollect actually releases disk space?

there is not a lot of documentation on how this tool works and what it does.

any help will be much appreciated

yaarix
  • 490
  • 7
  • 18

1 Answers1

6

Deletion of data in Cassandra is always adding more data so you need be careful with that.

nodetool garbagecollect performs single-sstable compactions to remove overwritten or logically deleted data. For each sstable, it will create a new sstable with unneeded data cleaned out. By default, garbagecollect removes rows or partitions that have been deleted or updated with newer data. It may also remove deleted or updated cell values if the -g CELL option is specified, but this will require more resources (I/O CPU). This command may also remove expired tombstones (older than gc_grace_seconds), but not the fresh ones. Plus there are also other limitations on the removal of tombstones.

If the expired tombstones are still exist, then the only major compaction may help to evict them, for example, by running nodetool compact -s on the individual tables, but you need to make sure that you have enough space - the same size as a table itself.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks! What do you mean by This command may also remove expired tombstones (older than gc_grace_seconds), but not the fresh ones. What are fresh tombstones? If running compact with --split-output flag, do I still need disk space as the size of the table itself? why do I need so much disk space when compacting? is that because the old sstable will still be on disk until compaction is finished? – yaarix Feb 19 '20 at 13:10
  • 1
    fresh tombstones - that are not reached `gc_grace_seconds` yet. Regarding the disk space - when compaction happens - it doesn't rewrite the existing SSTables - it writes a new files, and remove old files only after process finished... That's why you may need to have up to the table size of free disk space... – Alex Ott Feb 19 '20 at 13:43
  • Thanks. so both compact and garbagecollect will remove old tombstones? – yaarix Feb 19 '20 at 14:21
  • `garbagecollect` is more limited in functionality as it works with individual tables, so it may not completely cleanup old tombstones - that's why I wrote **may** cleanup... `compact` works on bigger data set, and could cleanup better - but it requires more disk space – Alex Ott Feb 19 '20 at 14:44
  • Thanks. Is it ok to ask you to elaborate a bit on why working with individual tables may not clean old tombstones? – yaarix Feb 19 '20 at 14:55
  • This article is very good: https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html – Alex Ott Feb 19 '20 at 16:17
  • Thanks very much! It helped in reclaiming the disk space. – charybr May 16 '20 at 16:00