0
$ cd /tmp
$ cp -r /var/lib/cassandra/data/keyspace/table-6e9e81a0808811e9ace14f79cedcfbc4 .
$ nodetool compact --user-defined table-6e9e81a0808811e9ace14f79cedcfbc4/*-Data.db

I expected the two SSTables (where the second one contains only tombstones) to be merged into one, which would be equivalent to the first one minus data masked by tombstones from the second one.

However, the last command returns 0 exit status and nothing changes in the table-6e9e81a0808811e9ace14f79cedcfbc4 directory (still two tables are there). Any ideas how to unconditionally merge potentially multiple SSTables into one in the offline manner (like above, not on SSTable files currently used by the running cluster)?

Alexander Shukaev
  • 16,674
  • 8
  • 70
  • 85

1 Answers1

3

Just nodetool compact <keyspace> <table> There is no real offline compaction, only telling cassandra which sstables to compact. user-defined compaction just is to give it a custom list of sstables and a major compaction (above example) will include all sstables in a table.

While it really depends on which version your using on if it will work there is https://github.com/tolbertam/sstable-tools#compact available. If desperate can import cassandra-all for your version and do like it : https://github.com/tolbertam/sstable-tools/blob/master/src/main/java/com/csforge/sstable/Compact.java

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Nice pointers, very helpful, thank you! I will reimplement this Java application meanwhile and will let this question float for some time in case anybody else has more ideas. Otherwise, willing to accept your answer. – Alexander Shukaev Jun 03 '19 at 21:24
  • @AlexanderShukaev did you manage to compress the SSTables offline? Would you be willing to share what you did? This would be very helpful. Thanks – BuckBazooka Nov 22 '19 at 10:12
  • @BuckBazooka, what exactly are you looking for? All I did was to cut out the necessary code from that `sstable-tools` command line utility to create my own which does only merging for simplicity and corporate usage. You could as well just use the `sstable-tools` as is, no work is required there. – Alexander Shukaev Nov 23 '19 at 12:50
  • Well the reason I asked was mainly that the code as is, was a bit suspicious to me. First there appears to be leaks, and if you look at the code you see things like placeholders for table name key-space like turtles/turtle. I wondered if the generated compacted file was really correct. – BuckBazooka Nov 25 '19 at 12:06
  • There is some hacks as its built from cassandra-all lib, the sstable-tools project was mostly to PoC things that got moved into C* proper slowly (multiple parts from that project are now in C*, sstabledump, sstablemetadata). The compact tool will likely be in C* proper in future as well. Its safe though, using same compaction code without the Purgers (which is likely more safe than normal compaction). If your writing your own, you can use code from sstabledump to avoid the turtle/turtle hacks etc for generating TableMetadata. – Chris Lohfink Dec 03 '19 at 19:40