3

We have a Cassandra cluster (2.1.11) with 15 nodes, replication factor 3 on SSD drives.

One of the table occupies 12 TB. Both live disk space and total disk space are equivalent. I also verified this number is the same on Ops center, JMX reports, and actual folder size on the file system.

We're getting in short of space, so we have removed 35% of the entire data. (each entry has 104 bytes, so we removed billions of rows)

However, we have gained no free space at all, although we see a lot of compactions going on while we delete entries.

Since then, we ran nodetool repair / nodetool clean / restart process jvm, no luck.

Does anybody know any other thing I can do?

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • Note the GC grace if you are running short on disk you could lower that for now and trigger compactions. – Jeff Beck Oct 26 '16 at 22:40
  • Thank you. We've run nightly cleanup batch process for a week. It hasn't been 10 days so far. We might change this value and restart process. Will update how that goes. – Hidetomo Morimoto Oct 28 '16 at 17:14
  • We have set gc_grace_periods to 3 days, and started repair process. We have not restarted the process. I certainly see the down-trend, but it's very slow. Last 3 days, we only see 30GB space freed up. Should we better restart all the boxes, or wait until the entire repair process finish? Repair process usually takes 7 - 10 days for us. – Hidetomo Morimoto Nov 04 '16 at 18:36
  • 1
    Its compactions not repairs that clean disk space. If you are using stcs there is no guarentee all deleted data will be cleaned up in a timely fashion. you may need to consider leveled. – Jeff Beck Nov 04 '16 at 18:44
  • 1
    Thank you. We're using LeveledCompactionStrategy on that particular index table. We'll stop repair process and run nodetool compact instead. – Hidetomo Morimoto Nov 04 '16 at 23:17

1 Answers1

3

Assume you'll have to wait gc_grace_seconds before the deleted data are eligible for getting their generated tombstones finally removed. So plan ahead in due time :)

Here's a good link on understanding the inner working of Cassandra and delete vs release of disk space. And maybe consider this link as well on howto do user defined compaction.