2

I have used python shelve library to pre-store a list of key-vector pairs. There are 3 million entries in total, which takes 6 GB memory storage. On a separate training file, I need to check for every record if it is a key in the shelve dictionary. This renders my program extremely slow to run. Is there a fast way to check if a key exists in shelve? Or are there other efficient ways to store key-vector pairs in python, so that it is efficient to check if a key exist?

Jack Cheng
  • 121
  • 1
  • 11
  • The best way to do this would be to move to a more efficient k/v store (like redis) or perhaps even a database (depending on what you want to do). – Burhan Khalid Mar 09 '15 at 07:25

1 Answers1

2

Use sqlite3 instead of shelve, and you can query things beyond just asking for arbitrary key. Also do note that shelve does not give any promises about such database being usable on any other Python version, or platform or anything thereof.

Even better though, use sqlite3 and store all the keys separately (with unique) and reference these by a foreign key from the vector table; you can scan and keep the key list in memory as a set (should need only say ~40 MiB + 3 M * average key size of RAM).