From my experience I'd expect pickling to be even more of a memory-hog than what you've done so far. However, creating a dict
loads every key and value in the shelf into memory at once, and you shouldn't assume because your shelf is 6GB on disk, that it's only 6GB in memory. For example:
>>> import sys, pickle
>>> sys.getsizeof(1)
24
>>> len(pickle.dumps(1))
4
>>> len(pickle.dumps(1, -1))
5
So, a very small integer is 5-6 times bigger as a Python int
object (on my machine) than it is once pickled.
As for the workaround: you can write more than one pickled object to a file. So don't convert the shelf to a dict
, just write a long sequence of keys and values to your file, then read an equally long sequence of keys and values on the other side to put into your new shelf. That way you only need one key/value pair in memory at a time. Something like this:
Write:
with open('myshelf.pkl', 'wb') as outfile:
pickle.dump(len(myShelf), outfile)
for p in myShelf.iteritems():
pickle.dump(p, outfile)
Read:
with open('myshelf.pkl', 'rb') as infile:
for _ in xrange(pickle.load(infile)):
k, v = pickle.load(infile)
myShelf[k] = v
I think you don't actually need to store the length, you could just keep reading until pickle.load
throws an exception indicating it's run out of file.