3

The way I want to use RocksDB I think is unusual. I want to use it to lower the memory pressure of an application that has a very large number of strings in memory. The reason is because the application will eventually scale to the point that it would otherwise require dozens of gigs of RAM to store all the strings. This is a 64-bit only application with parts of written in C++ and parts in VB.NET (I know. I know.)

I've been tasked with moving all of the strings to disk.

I want to be as performant as possible. Sure I can use something like SQLite, but I really don't need SQL at all. I just need a key/value store. The key can be a 32-bit int, and the value will be a string. Typical strings are 1K to 5K in length.

The performance characteristics required are as follows:

  1. Strings are being written to disk in bulk. After being written, they're rarely modified. Most of the time, they will simply be read only.
  2. Strings are being written to disk only as a way to move them out of RAM. Keeping all strings in RAM at the same time, for performance, will defeat the person. Ideally, I can specify how much RAM to take up as a cache.
  3. Durability is not important. I don't care if the write-cache takes a long time. In fact, I only care about the string being written to disk only when the buffer size specified in (2) above is exceeded. For instance, if have a billion strings on disk, and keep a thousand of them in RAM (as my buffer size), I'd be okay with the thousand are not written to disk until the thousand-and-one'th string is allocated.

Pretty much every system I looked at up to this point, memcached, redis, leveldb, lightning, LSM from sqlite 4, all solve different problems. Some solve the problem of ensuring things are persisted for durability, so there's a lot going on to make sure things crash-proof. Obviously in my case, i don't care about things being crash proof. My application will recreate the data store when the app starts. If my app crashes, i don't care about the content of what is left on disk. Yet others (such as memecached) are there to optimize disk performance by putting things in RAM first. It's solving a problem that's almost opposite of the problem I need to solve.

Ultimately, and since this is a 64-bit system, I'd like the system to use memory-mapped files for optimization.

RocksDB comes the closest to the tool that I think I need to use, but it's a very confusing and complex system with a million settings. Furthermore, my specific scenario isn't in any of its posted "recipes".

So I'm curious if anyone from the RocksDB team is willing to give me some guidance. If I can get this to work, I'll be very grateful and will certainly help others online to solve the same type of problem.

Daisha Lynn
  • 459
  • 1
  • 4
  • 16

2 Answers2

0

I didn't quite get the part where you say you need move data to disk, but didn't need durability.

Other than that, leveldb would be a great choice:

*fast writes
*ability to do atomic bulk insert (WriteBatch)
*low memory footprint
*fast key lookup (and fast iterator to read adjacent data)

You didn't specify platform, but it's native on linux or you could use windows port (.net wrappers) on windows.

ren
  • 3,843
  • 9
  • 50
  • 95
  • Windows is the platform. – Daisha Lynn Jun 14 '16 at 11:57
  • Durability is to ensure that data is not lost if there's a system failure. I don't need that because this is data that is otherwise in RAM anyway. I need to put it in the database to save on RAM, not that I care about the data to be persist across restarts. – Daisha Lynn Jun 14 '16 at 11:58
0

Sqlite Index Blaster was developed exactly for this purpose -> one time inserts sacrificing durability for performance.

It uses a LRU cache so can specify how much RAM to be used to tweak performance.

It can be used as a Key-Value store or a regular clustered-index table store.

It stores data in the most popular Sqlite format.

Disclaimer: I am the author of this repo.

Arundale Ramanathan
  • 1,781
  • 1
  • 18
  • 25