How does Cassandra (or Scylla) sort clustering columns?

Question

One of the benefits of Cassandra (or Scylla) is that:

When a table has multiple clustering columns, the data is stored in nested sort order. https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/whereClustering.html

Because of this I think reading the data back in that same sorted order should be very fast.

If data is written in a different order than the clustering columns specify, when does Cassandra (or Scylla) actually re-order the data?

Is it when the memtables are flushed to SSTables?

What if a memtable has already been flushed, and I add a new record that should be before records in an existing SSTable?

Does it keep the data out of order on disk for a while and re-order it during compaction?

If so, what steps does it take to make sure reads are in the correct order?

Try reading in Scylla's `Architecture` page about sstables: https://docs.scylladb.com/architecture/sstable/ I think you'll find some of the answers in the `sstable interpretation` part: https://docs.scylladb.com/architecture/sstable/sstable-interpretation/ — TomerSan, Oct 28 '18 at 07:15

score 7 · Accepted Answer · answered Oct 29 '18 at 08:35

Data is always sorted in any given sstable.

When a memtable is flushed to disk, that will create a new sstable, which is sorted within itself. This happens naturally since memtables store data in sorted order, so no extra sorting is needed at that point. Sorting happens on insertion into the memtable.

A read, which is using natural ordering, will have to read from all sstables which are relevant for the read, merging multiple sorted results into one sorted result. This merging happens in memory on-the-fly.

Compaction, when it kicks in, will replace multiple sstables with one, creating a merged stream much like a regular read would do.

This technique of storing data is known as a log-structured merge tree.

Great explanation and thank you. I have been trying to find documentation or write-ups about this online, but haven't found anything that talks about the merging multiple sstables into one result on-the-fly. — Drew LeSueur, Oct 29 '18 at 17:09

score 2 · Answer 2 · answered Oct 29 '18 at 08:36

The data is reordered during the compaction.

Basically, any write is just an append, in order to be very fast. There are no reads or seeks involved.

When reading data, Cassandra is reading from the active memtable and from one or more SSTables. Data is aggregated and the query is satisfied.

Since data distribution might require accessing a growing number of SSTables, compaction has the role to reorganize the data on disk so it will eliminate the potential overhead of reading data from multiple SSTables. It is worth mentioning that SSTables are immutable and new SSTables are created. The old ones are discarded.

The process is similar in both Scylla and Cassandra.

How does Cassandra (or Scylla) sort clustering columns?

2 Answers2