It seems that when overwriting a key in a shelve, under certain circumstances the shelve size unexpectedly keeps growing larger. It is as if some data in a shelve ends up not having a reference to it, like in a memory leak. It seems to have something to do with appending to lists. Does anyone know why? Here is a minimal example:
import shelve, os
import numpy as np
def run():
list_array = []
expected_size = 0
for i in range(5):
array_100mb = np.zeros(1024*1024*100//8)
list_array.append(array_100mb)
expected_size = 100*len(list_array)
with shelve.open('shelve_test') as s:
s['val'] = list_array
size_mb = os.path.getsize('shelve_test.dat') // 1024 // 1024
print(f'Iteration {i}: \t shelve size is {size_mb}Mb; \t expected size is {expected_size}Mb')
for j in range(5):
run()
print()
This outputs:
Iteration 0: shelve size is 100Mb; expected size is 100Mb
Iteration 1: shelve size is 300Mb; expected size is 200Mb
Iteration 2: shelve size is 600Mb; expected size is 300Mb
Iteration 3: shelve size is 1000Mb; expected size is 400Mb
Iteration 4: shelve size is 1500Mb; expected size is 500Mb
Iteration 0: shelve size is 1500Mb; expected size is 100Mb
Iteration 1: shelve size is 1700Mb; expected size is 200Mb
Iteration 2: shelve size is 2000Mb; expected size is 300Mb
Iteration 3: shelve size is 2400Mb; expected size is 400Mb
Iteration 4: shelve size is 2900Mb; expected size is 500Mb
Iteration 0: shelve size is 2900Mb; expected size is 100Mb
Iteration 1: shelve size is 3100Mb; expected size is 200Mb
Iteration 2: shelve size is 3400Mb; expected size is 300Mb
Iteration 3: shelve size is 3800Mb; expected size is 400Mb
Iteration 4: shelve size is 4300Mb; expected size is 500Mb
Iteration 0: shelve size is 4300Mb; expected size is 100Mb
Iteration 1: shelve size is 4500Mb; expected size is 200Mb
Iteration 2: shelve size is 4800Mb; expected size is 300Mb
Iteration 3: shelve size is 5200Mb; expected size is 400Mb
Iteration 4: shelve size is 5700Mb; expected size is 500Mb
Iteration 0: shelve size is 5700Mb; expected size is 100Mb
Iteration 1: shelve size is 5900Mb; expected size is 200Mb
Iteration 2: shelve size is 6200Mb; expected size is 300Mb
Iteration 3: shelve size is 6600Mb; expected size is 400Mb
Iteration 4: shelve size is 7100Mb; expected size is 500Mb
Python version is 3.6.6