1

We have high throughput use case (TPS of 50k, or 180M txn per hour); about 15-30 million of these ops per hour are deletes. Given that YugabyteDB is a log structured database and overwritten data is reclaimed only at compactions, what would be the hit on read-performance?

sai
  • 165
  • 1
  • 10

1 Answers1

2

The impact of a large number of deletes/overwrites to a key is pretty minimal in YugabyteDB.

YugabyteDB uses a LSM (log-structured merge tree) based storage engine. So, it is true that reads have to potentially consult multiple SSTable files before serving the answer, and compactions periodically keep reducing the number of SSTable files back to keep the read overheads minimal.

In fact, the number of SSTable files can have a slightly more dominant effect on performance than the number of overwrites of keys. [But here too, the use of bloom filters helps to minimize the number of SSTable files that need to be consulted for a read.]

The reason the impact of large number of overwrites to a key is pretty minimal in YugabyteDB is multi-fold:

  • The read operation in a LSM engine is done by doing logical merge of memtables/SSTables that are sorted in descending timestamp order for each key. In effect, the read will see the latest value of the row first, and the overheads of deletes (which show up further down in the logical sort order) should not be observable at all in practice.

  • Flushes and minor compactions only need to retain the latest deleted/overwritten value, and all other overwrites can be garbage collected immediately. This doesn't need to wait for a major compaction. Unlike Apache Cassandra, which does an eventually consistent replication and therefore, to avoid the problem of deleted values resurfacing must retain deleted tombstones for much longer, in YugabyteDB (which uses Raft protocol for replication), no special such handling is needed for deletes.

Finally, here's a sample program I tried against 2.0.10.0 on a RF=1 cluster.

https://gist.github.com/kmuthukk/f93a5373dbaddd4e49427cc7b4130427

This program first does $iters number of (default 25) overwrites of each key (by deleting them first, and then inserting them back). And measures the read times. The average read latency was about 0.35ms. Changing $iters to 1 or 50 doesn't make any significant different in the read latencies.

$iters=1
Read ops per sec: 2836.8794326241
Avg read latency (ms): 0.35

$iters=25
Read ops per sec: 2857.1428571429
Avg read latency (ms): 0.35

$iters=50
Read ops per sec: 2836.8794326241
Avg read latency (ms): 0.35