I have used python shelve library to pre-store a list of key-vector pairs. There are 3 million entries in total, which takes 6 GB memory storage. On a separate training file, I need to check for every record if it is a key in the shelve dictionary. This renders my program extremely slow to run. Is there a fast way to check if a key exists in shelve? Or are there other efficient ways to store key-vector pairs in python, so that it is efficient to check if a key exist?
Asked
Active
Viewed 1,890 times
2
-
The best way to do this would be to move to a more efficient k/v store (like redis) or perhaps even a database (depending on what you want to do). – Burhan Khalid Mar 09 '15 at 07:25
1 Answers
2
Use sqlite3
instead of shelve, and you can query things beyond just asking for arbitrary key. Also do note that shelve
does not give any promises about such database being usable on any other Python version, or platform or anything thereof.
Even better though, use sqlite3
and store all the keys separately (with unique) and reference these by a foreign key from the vector table; you can scan and keep the key list in memory as a set
(should need only say ~40 MiB + 3 M * average key size of RAM).

Antti Haapala -- Слава Україні
- 129,958
- 22
- 279
- 321
-
1Is sqlite3 faster when I just need to save one value per key and quickly get it back? – Groosha Nov 08 '15 at 10:15