Why are Get and MultiGet significantly slower for large key sets compared to using an Iterator?

Question

I'm currently playing around with RocksDB (C++) and was curious about some performance metrics I've experienced.

For testing purposes, my database keys are file paths and the values are filenames. My database has around 2M entries in it. I'm running RocksDB locally on a MacBook Pro 2016 (SSD).

My use case is dominated by reads. Full key scans are quite common as are key scans that include a "significant" number of keys. (50%+)

I'm curious about the following observations:

1. An Iterator is dramatically faster than calling Get when performing full key scans.

When I want to look at all of the keys in the database, I'm seeing a 4-8x performance improvement when using an Iterator instead of calling Get for each key. The use of MultiGet makes no difference.

In the case of calling Get roughly 2M times, the keys have been previously fetched into a vector and sorted lexicographically. Why is calling Get repeatedly so much slower than using an Iterator? Is there a way to narrow the performance gap between the two APIs?

2. When fetching around half the keys, the performance between using an Iterator and Get starts to become negligible.

As the number of keys to fetch is reduced, then making multiple calls to Get starts to take about as long as using an Iterator as the iterator is paying the price of scanning over keys that aren't in the desired keyset.

Is there some "magic" ratio where this becomes true for most databases? For example, if I need to scan over 25% of the keys, then calling Get is faster, but if it's 75% of the keys, then an Iterator is faster. But those numbers are just "made up" by rough testing.

3. Fetching keys in sorted order does not appear to improve performance.

If I pre-sort the keys I want to fetch into the same order that an Iterator would return them in, that does not appear to make calling Get multiple times any faster. Why is that? It's mentioned in the documentation that it's recommended to sort keys before doing a batch insert. Does Get not benefit from the same look-ahead caching that an Iterator benefits from?

4. What settings are recommended for a read-heavy use case?

Finally, are there any specific settings recommended for a read-heavy use case that might involve scanning a significant number of keys at once?

macOS 10.14.3, MacBook Pro 2016 SSD, RocksDB 5.18.3, Xcode 10.1

What are the compiler options used to build your code? Are you timing an optimized build? — PaulMcKenzie, Mar 26 '19 at 16:41
I'm statically linking against RocksDB as installed by Brew. Under Xcode, I'm performing a 'Release' build. I'm basically timing a simple `for` loop that either uses the `Iterator` or makes multiple calls to `Get`. The values, in both cases, are being read into a vector just to prevent any no-op behavior. Each loop is run several times in succession to account for any disk caching. For me, this isn't about micro-benchmarking but rather rough benchmarking and I'm seeing differences of well over 100% between the two APIs, hence the inquiry. — kennyc, Mar 26 '19 at 16:49

score 3 · Accepted Answer · answered Mar 30 '19 at 00:30

RocksDB internally represents its data as a log-structured merge tree which has several sorted layers by default (this can be changed with plugins/config). The intuition from Paul's first answer holds, except there is no classical index; the data is actually sorted on disk with pointers to the next files. The lookup operation has on average logarithmic complexity, but advancing an iterator in a sorted range is constant time. So for dense sequential reads, iterating is much faster.

The point where the costs balance out is determined not only by the number of keys you read, but also by the size of the database. As the database grows, the lookup becomes slower, while Next() remains constant. Very recent inserts are likely to be read very fast, since they may still be in memory (memtables).

Sorting the keys actually just improves your cache hit-rate. Depending on your disk, the difference may be very small, e.g., if you have an NVMe SSD, the difference in access time is just not as drastic anymore as it was when it was RAM vs. HDD. If you have to do several operations over the same or even different key-sets doing them by key-order (f(a-c) g(a-c) f(d-g)...) instead of sequentially should improve your performance, since you will have more cache-hits and also benefit from the RocksDB block cache.

The tuning guide is a good starting point, especially the video on database solutions, but if RocksDB is too slow for you also consider using a DB based on a different storage algorithm. LSM is typically better for write-heady workloads, and while RocksDB lets you control read vs. write vs. space amplification very well, a b-tree or ISAM based solution may just be much faster for range-reads/repeated reads.

score 1 · Answer 2 · answered Mar 26 '19 at 16:54

I don't know anything about RocksDB per-se, but I can answer a lot of this from first principles.

An Iterator is dramatically faster than calling Get when performing full key scans.

This is likely to be because Get has to do a full lookup in the underlying index (starting from the top) whereas advancing an iterator can be achieved by just moving from the current node to the next. Assuming the index is implemented as a red-black tree or similar, there's a lot less work in the second method than the first.

When fetching around half the keys, the performance between using an Iterator and Get starts to become negligible.

So you are skipping entries by calling iterator->Next () multiple times? If so, then there will come a point where it's cheaper to call Get for each key instead, yes. Exactly when that happens will depend on the number of entries in the index (since that determines the number of levels in the tree).

Fetching keys in sorted order does not appear to improve performance.

No, I would not expect it to. Get is (presumably) stateless.

What settings are recommended for a read-heavy use case?

That I don't know, sorry, but you might read:

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

+1 Thanks for the insight Paul. The statelessness of `Get` is probably the biggest difference between the two, which makes sense the more I did into it. — kennyc, Mar 27 '19 at 08:52

Why are Get and MultiGet significantly slower for large key sets compared to using an Iterator?

2 Answers2