cassandra blobs, tombstones and space reclamation

Question

I'm trying to understand how quickly space is reclaimed in Cassandra after deletes. I've found a number of articles that describe tombstoning and the problems this can create when you are doing range queries and Cassandra has to scan through lots of tombstoned rows to find the much more scarce live ones. And I get that you can't set gc_grace_seconds too low or you will have zombie records that can pop up if a node goes offline and comes back after the tombstones disappeared off the remaining machines. That all makes sense.

However, if the tombstone is placed on the key then it should be possible for the space from rest of the row data to be reclaimed.

So my question is, for this table:

create table somedata (
  category text,
  id timeuuid,
  data blob,
  primary key ((category), id)
);

If I insert and then remove a number of records in this table and take care not to run into the tombstone+range issues described above and at length elsewhere, when will the space for those blobs be reclaimed?

In my case, the blobs may be larger than the recommended size (1mb I believe) but they should not be larger than ~15mb, which I think is still workable. But it makes a big space difference if all of those blobs stick around for 10 days (default gc_grace_seconds value) vs if only the keys stick around for 10 days.

When I looked I couldn't find this particular aspect described anywhere.

score 1 · Accepted Answer · answered Nov 23 '16 at 18:18

1

The space will be reclaimed after the gc_grace_seconds clause is done, and you will have keys and blobs sticking around. Also you'll need to consider that this may increase if you also have updates (which will be different versions of the same record identified by the timestamp of when it was created) and the replication factor used (amount of copies of the same record distributed across the nodes).

You will always have trade-offs between fault resilience and disk usage, the customization of your settings (gc_grace_seconds, ttl, replication factor, consistency level) will depend on your use case and the SLA's that you need to fulfill.

answered Nov 23 '16 at 18:18

Carlos Monroy Nieblas

2,225
2
16
27

gotcha. what about if there is an overwrite? (overwrite of the blob value/column) in that case is it possible/probable that a compaction that occurs earlier than gc_grace_seconds will cause the old blob to be completely discarded, since it has been replaced now with a new one? – Brad Peabody Nov 23 '16 at 18:23
no, there is no overwrite as the records are immutable, if there is an update of the information, Cassandra will create a new record with a different timestamp, the record with the latest timestamp is considered as the current record, the original record will stay for the time defined on the gc_grace_seconds clause – Carlos Monroy Nieblas Nov 23 '16 at 18:27
okay, thanks - that clears it up. so i guess the key is to figure out a value for gc_grace_seconds that walks the line between how long you want to keep things around so as to properly handle nodes going offline and then back on; and how much extra disk space that will take up. – Brad Peabody Nov 23 '16 at 18:31

cassandra blobs, tombstones and space reclamation

1 Answers1