2

I need a key-value database, like redis or memcached, but not in memory and rather on disk. After filling the database (which we do regularly and from scratch), I'd actually only need the get operation, but from many different processes (so Kyoto Cabinet and LevelDB do not work for me).

I need like 5 million keys and ~10-30gb of data, so some other simple databases don't work as well.

I can't find any information on whether RocksDB can handle multiple read-only clients; it's not straight-forward to build on my OS so I wanted to ask before doing that. If it can't, is there any database which would work? Preferably with an Ubuntu package and Python bindings ;-).

We're just using many-many small files now, but it really sucks, as we want easy backups, copying, etc. I also suspect this may cause slowdowns, but it doesn't really matter that much.

keelar
  • 5,814
  • 7
  • 40
  • 79
Valentin Golev
  • 9,965
  • 10
  • 60
  • 84
  • Have you considered SQLite? – Lasse V. Karlsen Apr 24 '14 at 14:11
  • Actually, nope. Is it really good for such a use-case? – Valentin Golev Apr 24 '14 at 14:15
  • Yes, just create a single table with key/value columns. You would use normal SQL to access the table. Not sure keys can be purely binary though. With the advent of WAL (Write-ahead-logging), you can even support writing in parallel with multiple readers. SQLite only supports a single concurrent writer though. Also note that though you say "processes", if you mean different computers, then I would disregard SQLite. SQLite is good for local databases, but not so good for networked databases. – Lasse V. Karlsen Apr 24 '14 at 14:29

2 Answers2

9

Yes, you should be able to run multiple read-only clients on a single RocksDB database. Just open the database with DB::OpenForReadOnly() call: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L108

Igor Canadi
  • 241
  • 1
  • 4
1

The simplest answer is probably Berkeley DB, and bindings are a part of the stdlib: https://docs.python.org/2/library/anydbm.html

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
  • thanks, I'll try that! for some reason it says it's deprecated, and that I should use pybsddb: http://www.jcea.es/programacion/pybsddb_doc/contents.html – Valentin Golev Apr 25 '14 at 09:23
  • Berkeley DB performs horribly compared to all recent key-val DBs in every benchmarks I found. – jiping-s Mar 10 '16 at 11:21