I have a file of size 500MB
if I store each line of that file in a dictionary setup like
file = "my_file.csv"
with open(file) as f:
for l in f:
delimiter = ','
line = l.split(delimiter)
hash_key = delimiter.join(line[:4])
store_line = delimiter.join(line[4:])
store_dict[hash_key] = store_line
To check my memory, I compared the memory usage of my program by watching htop
, first with the above, then switching the last line to
print(hash_key + ":" + store_line)
And that took < 100MB of memory.
the size of my store_dict is approximately 1.5GB
in memory. I have checked for memory leaks, I can't find any. Removing this line store_dict[hash_key] = store_line
results in the program taking < 100MB of memory. Why does this take up so much memory? Is there anyway to store the lines as a dictionary and not have it take up so much memory?