edit: Using Win10 and python 3.5
I have a function that uses mmap to remove bytes from a file at a certain offset:
def delete_bytes(fobj, offset, size):
fobj.seek(0, 2)
filesize = fobj.tell()
move_size = filesize - offset - size
fobj.flush()
file_map = mmap.mmap(fobj.fileno(), filesize)
file_map.move(offset, offset + size, move_size)
file_map.close()
fobj.truncate(filesize - size)
fobj.flush()
It works super fast, but when I run it on a large number of files, the memory quickly fills up and my system becomes unresponsive.
After some experimenting, I found that the move() method was the culprit here, and in particular the amount of data being moved (move_size).
The amount of memory being used is equivalent to the total amount of data being moved by mmap.move()
.
If I have 100 files with each ~30 MB moved, the memory gets filled with ~3GB.
Why isn't the moved data released from memory?
Things I tried that had no effect:
- calling
gc.collect()
at the end of the function. - rewriting the function to move in small chunks.