What is exact meaning of compaction_throughput_mb_per_sec?

Question

As per the DataStax Cassandra yaml documentation link https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

compaction_throughput_mb_per_sec
(Default: 16) Throttles compaction to the specified total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended value is 16 to 32 times the rate of write throughput (in MB/second). Setting the value to 0 disables compaction throttling.

My literal interpretation of above text is, if you are observing disk I/O (mb/s) as say 38 mb/s, for now consider only the write load on Cassandra nodes, then compaction_throughput_mb_per_sec shall be set to 38 * 16 = 608 or 38 * 32 = 1216 and that is irrespective of the compaction strategy.

If above interpretation is correct then kindly help let me understand the actual meaning of the value 608 or 1216 in the context of throttling compaction and total throughput across system for Size tiered compaction strategy (default) with example may be by extending the one mentioned below.

The plot:

As per documentation the min_threshold value for SizeTieredCompactionStrategy is 6. In our case it is unchanged. On an average, disk I/O per node is being observed to be around 38 mb/s (only writes, no read operations happening). compaction_throughput_mb_per_sec value is 16.

What would be the compaction workflow with value 16? If we change it to 608 then exactly what is going to change, what is going to be impacted and how?

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Let's have a relook at the meaning of compaction.

the compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.

...

The compaction_throughput_mb_per_sec parameter is designed for use with large partitions because compaction is throttled to the specified total throughput across the entire system.

Refer: Configuring compaction

To preserve read performance in a mixed read-write workload, you need to mitigate the tendency of small SSTables to accumulate during a single long-running compaction.

Refer: concurrent_compactors

So when you update compaction_throughput_mb_per_sec, you update the rate at which new consolidated SSTables are written; and turn helps you to mitigate the tendency of small SSTables to accumulate during compaction.

So, in short, when you increase the value of compaction_throughput_mb_per_sec from 16 to 608, you increase the write-throughput required for writing SSTables, in turn reduce the chances of small SSTables getting created, and finally improve read performance.

What is exact meaning of compaction_throughput_mb_per_sec?

1 Answers1