The way I want to use RocksDB I think is unusual. I want to use it to lower the memory pressure of an application that has a very large number of strings in memory. The reason is because the application will eventually scale to the point that it would otherwise require dozens of gigs of RAM to store all the strings. This is a 64-bit only application with parts of written in C++ and parts in VB.NET (I know. I know.)
I've been tasked with moving all of the strings to disk.
I want to be as performant as possible. Sure I can use something like SQLite, but I really don't need SQL at all. I just need a key/value store. The key can be a 32-bit int, and the value will be a string. Typical strings are 1K to 5K in length.
The performance characteristics required are as follows:
- Strings are being written to disk in bulk. After being written, they're rarely modified. Most of the time, they will simply be read only.
- Strings are being written to disk only as a way to move them out of RAM. Keeping all strings in RAM at the same time, for performance, will defeat the person. Ideally, I can specify how much RAM to take up as a cache.
- Durability is not important. I don't care if the write-cache takes a long time. In fact, I only care about the string being written to disk only when the buffer size specified in (2) above is exceeded. For instance, if have a billion strings on disk, and keep a thousand of them in RAM (as my buffer size), I'd be okay with the thousand are not written to disk until the thousand-and-one'th string is allocated.
Pretty much every system I looked at up to this point, memcached, redis, leveldb, lightning, LSM from sqlite 4, all solve different problems. Some solve the problem of ensuring things are persisted for durability, so there's a lot going on to make sure things crash-proof. Obviously in my case, i don't care about things being crash proof. My application will recreate the data store when the app starts. If my app crashes, i don't care about the content of what is left on disk. Yet others (such as memecached) are there to optimize disk performance by putting things in RAM first. It's solving a problem that's almost opposite of the problem I need to solve.
Ultimately, and since this is a 64-bit system, I'd like the system to use memory-mapped files for optimization.
RocksDB comes the closest to the tool that I think I need to use, but it's a very confusing and complex system with a million settings. Furthermore, my specific scenario isn't in any of its posted "recipes".
So I'm curious if anyone from the RocksDB team is willing to give me some guidance. If I can get this to work, I'll be very grateful and will certainly help others online to solve the same type of problem.