3

I'm trying to use RocksDB to store billions of records, so the resulting databases are fairly large - hundreds of gigabytes, several terabytes in some cases. The data is initially imported from a different service snapshot and updated from Kafka afterwards, but that's beside the point.

There are two parts of the problem:

Part 1) Initial data import takes hours with autocompactions disabled (it takes days if I enable them), after that I reopen the database with autocompactions enabled, but they aren't triggered automatically when the DB is opened, so I have to do it with CompactRange(Range{nil, nil}) in Go manually. Manual compaction takes almost similar time with only one CPU core being busy and during compaction the overall size of the DB increases 2x-3x, but then ends up around 0.5x

Question 1: Is there a way to avoid 2x-3x data size growth during compaction? It becomes a problem when the data size reaches terabytes. I use the default Level Compaction, which according to the docs "optimizes disk footprint vs. logical database size (space amplification) by minimizing the files involved in each compaction step".

Question 2: Is it possible to engage more CPU cores for manual compaction? Looks like only one is used atm (even though MaxBackgroundCompactions = 32). It would speed up the process A LOT as there are no writes during initial manual compaction, I just need to prepare the DB without waiting days. Would it work with several routines working on different sets of keys instead of just one routine working on all keys? If yes, what's the best way to divide the keys into these sets?

Part 2) Even after this manual compaction, RocksDB seems to perform autocompaction later, after I start adding/updating the data, and after it's done the DB size gets even smaller - around 0.4x comparing to the size before the manual compaction.

Question 3: What's the difference between manual and autocompation and why autocompaction seems to be more effective in terms of resulting data size?

My project is in Go, but I'm more or less familiar with RocksDB C++ code and I couldn't find any answers to these questions in the docs or in the source code.

0 Answers0