6

edit: Using Win10 and python 3.5

I have a function that uses mmap to remove bytes from a file at a certain offset:

def delete_bytes(fobj, offset, size):
    fobj.seek(0, 2)
    filesize = fobj.tell()
    move_size = filesize - offset - size

    fobj.flush()
    file_map = mmap.mmap(fobj.fileno(), filesize)
    file_map.move(offset, offset + size, move_size)
    file_map.close()

    fobj.truncate(filesize - size)
    fobj.flush()

It works super fast, but when I run it on a large number of files, the memory quickly fills up and my system becomes unresponsive.

After some experimenting, I found that the move() method was the culprit here, and in particular the amount of data being moved (move_size). The amount of memory being used is equivalent to the total amount of data being moved by mmap.move(). If I have 100 files with each ~30 MB moved, the memory gets filled with ~3GB.

Why isn't the moved data released from memory?

Things I tried that had no effect:

  • calling gc.collect() at the end of the function.
  • rewriting the function to move in small chunks.
mahkitah
  • 562
  • 1
  • 6
  • 19
  • What operating system are you using? Python version as well. – wind85 Jun 14 '16 at 12:01
  • Can you please also check if the memory is used by your python process or by the OS? – Leon Jun 14 '16 at 12:23
  • Sorry, forgot to mention: I'm on Win10 and python 3.5. How do I check if the memory is used by python or OS? – mahkitah Jun 14 '16 at 12:39
  • Doesn't Windows 10 include the Task Manager system utilitiy? – Leon Jun 14 '16 at 13:41
  • Operating systems tend to keep pages that have been used in the buffer cache, because in general they're likely to be used again. Maybe your buffer cache is filling up. Also; are you actually closing the file objects somewhere? – Roland Smith Jun 14 '16 at 22:56

1 Answers1

1

This seems like it should work. I did find one suspicious bit in the mmapmodule.c source code, #ifdef MS_WINDOWS. Specifically, after all the setup to parse arguments, the code then does this:

if (fileno != -1 && fileno != 0) {
    /* Ensure that fileno is within the CRT's valid range */
    if (_PyVerify_fd(fileno) == 0) {
        PyErr_SetFromErrno(PyExc_OSError);
        return NULL;
    }
    fh = (HANDLE)_get_osfhandle(fileno);
    if (fh==(HANDLE)-1) {
        PyErr_SetFromErrno(PyExc_OSError);
        return NULL;
    }
    /* Win9x appears to need us seeked to zero */
    lseek(fileno, 0, SEEK_SET);
}

which moves your underlying file object's offset from "end of file" to "start of file" and then leaves it there. That seems like it should not break anything, but it might be worth doing your own seek-to-start-of-file just before calling mmap.mmap to map the file.

(Everything below is wrong, but left in since there are comments on it.)


In general, after using mmap(), you must use munmap() to undo the mapping. Simply closing the file descriptor has no effect. The Linux documentation calls this out explicitly:

munmap()
The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.

(The BSD documentation is similar. Windows may behave differently from Unix-like systems here, but what you are seeing suggests that they work the same way.)

Unfortunately, Python's mmap module does not bind the munmap system call (nor mprotect), at least as of both 2.7.11 and 3.4.4. As a workaround you can use the ctypes module. See this question for an example (it calls reboot but the same technique works for all C library functions). Or, for a somewhat nicer method, you can write wrappers in .

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • Doesn't `mmap.close()` perform `unmap()` underneath? – Leon Jun 14 '16 at 13:14
  • 1
    `mmap.close()` does call `UnmapViewOfFile` (windows) or `munmap` (unix) (python 3.4, mmapmodule.c). – J.J. Hakala Jun 14 '16 at 13:21
  • The mapping itself isn't problem. If I remove the line with `mmap.move()` or replace it with another method (like `mmap.resize()`) there's no problem at all. – mahkitah Jun 14 '16 at 14:37
  • @J.J.Hakala: interesting; my go-to Python (FreeBSD) does not have `mmap.close` at all. @mahkitah: if you do not call `mmap.move` (which I also do not have but I imagine it turns into a C library `memmove` call) or otherwise "touch" the memory, it won't get page-faulted-in in the first place. On Unix-y systems I would try `strace` or `ktrace` or whatever other system-call-tracing facility is around to see if the OS un-map function is being invoked. – torek Jun 14 '16 at 19:52
  • @J.J.Hakala: D'oh, I'm missing the obvious: it's not `mmap.close()`, but rather `mmap().close()`. – torek Jun 14 '16 at 19:57