Is freeing handled differently for small/large numpy arrays?

Question

I am trying to debug a memory problem with my large Python application. Most of the memory is in numpy arrays managed by Python classes, so Heapy etc. are useless, since they do not account for the memory in the numpy arrays. So I tried to manually track the memory usage using the MacOSX (10.7.5) Activity Monitor (or top if you will). I noticed the following weird behavior. On a normal python interpreter shell (2.7.3):

import numpy as np # 1.7.1
# Activity Monitor: 12.8 MB
a = np.zeros((1000, 1000, 17)) # a "large" array
# 142.5 MB
del a
# 12.8 MB (so far so good, the array got freed)
a = np.zeros((1000, 1000, 16)) # a "small" array
# 134.9 MB
del a
# 134.9 MB (the system didn't get back the memory)
import gc
gc.collect()
# 134.9 MB

No matter what I do, the memory footprint of the Python session will never go below 134.9 MB again. So my question is:

Why are the resources of arrays larger than 1000x1000x17x8 bytes (found empirically on my system) properly given back to the system, while the memory of smaller arrays appears to be stuck with the Python interpreter forever?

This does appear to ratchet up, since in my real-world applications, I end up with over 2 GB of memory I can never get back from the Python interpreter. Is this intended behavior that Python reserves more and more memory depending on usage history? If yes, then Activity Monitor is just as useless as Heapy for my case. Is there anything out there that is not useless?

Interesting, on Linux even smaller arrays are returned to the OS. That's quite surprising, since often, `malloc` doesn't actually return anything to the OS -- it just places `free`'d memory on its own free list for later reuse. — Fred Foo, Aug 19 '13 at 09:52
@larsmans: So you don't see an increased memory usage of the Python interpreter after creating/del'ing numpy arrays of various sizes on Linux? — Stefan, Aug 19 '13 at 09:57
I see it increase after `np.zeros` and decrease again after `del`. Have you tried tools like `malloc_history` or `vmmap`? Those could give some insight into how Python/NumPy deal with memory. — Fred Foo, Aug 19 '13 at 09:58
@larsmans: ...and on Linux there is no threshold size (~130 MB) like I am seeing on MacOSX? So this does not appear to be intended behavior then. I will look into the tools you suggested. — Stefan, Aug 19 '13 at 10:01
Even with `a = [np.zeros(10000) for i in xrange(10000)]`, I see the memory usage drop back to the old level after `del a`. — Fred Foo, Aug 19 '13 at 10:05
Hm, that wasn't a very good test. When I do `a = [np.zeros(100) for i in xrange(1000000)]` and then `del a`, the small buffers do *not* get returned to the OS. So the threshold is there, but it's much lower. — Fred Foo, Aug 19 '13 at 10:06
@larsmans: OK, good to know. Could you find out what the threshold is on your system? I found the 130 MB on MacOSX to be well reproducible. Is there anything you could do to eventually get the memory back? — Stefan, Aug 19 '13 at 10:10
Related discussion: [Numpy's policy for releasing memory](http://numpy-discussion.10968.n7.nabble.com/Numpy-s-policy-for-releasing-memory-td1533.html) — Bakuriu, Aug 19 '13 at 10:37

Bakuriu · Accepted Answer · 2013-08-19T10:58:13.043

Reading from Numpy's policy for releasing memory it seems like numpy does not have any special handling of memory allocation/deallocation. It simply calls free() when the reference count goes to zero. In fact it's pretty easy to replicate the issue with any built-in python object. The problem lies at the OS level.

Nathaniel Smith has written an explanation of what is happening in one of his replies in the linked thread:

In general, processes can request memory from the OS, but they cannot give it back. At the C level, if you call free(), then what actually happens is that the memory management library in your process makes a note for itself that that memory is not used, and may return it from a future malloc(), but from the OS's point of view it is still "allocated". (And python uses another similar system on top for malloc()/free(), but this doesn't really change anything.) So the OS memory usage you see is generally a "high water mark", the maximum amount of memory that your process ever needed.

The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations can be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see.

So, it seems like there is no general solution to the problem .Allocating many small objects will lead to a "high memory usage" as profiled by the tools, even thou it will be reused when needed, while allocating big objects wont show big memory usage after deallocation because memory is reclaimed by the OS.

You can verify this allocating built-in python objects:

In [1]: a = [[0] * 100 for _ in range(1000000)]

In [2]: del a

After this code I can see that memory is not reclaimed, while doing:

In [1]: a = [[0] * 10000 for _ in range(10000)]

In [2]: del a

the memory is reclaimed.

To avoid memory problems you should either allocate big arrays and work with them(maybe use views to "simulate" small arrays?), or try to avoid having many small arrays at the same time. If you have some loop that creates small objects you might explicitly deallocate objects not needed at every iteration instead of doing this only at the end.

I believe Python Memory Management gives good insights on how memory is managed in python. Note that, on top of the "OS problem", python adds another layer to manage memory arenas, which can contribute to high memory usage with small objects.

This is very relevant, thanks. I could reproduce the behavior with `l = [i for i in xrange(100000000)]`, where `del l` didn't reclaim the memory right away. However, after `gc.collect()`, I got all the memory back. Is there a way I can force `numpy` to do the same? — Stefan, Aug 19 '13 at 11:00
Also, if this boils down to the fact that OS memory usage indicators are useless for Python/numpy memory debugging, and since Heapy et al don't work for numpy arrays, is there something out there that one can use to debug memory usage of a large Python + numpy project? — Stefan, Aug 19 '13 at 11:12
@Stefan In the case of integers *maybe* it was because a *whole* arena was freed and the interpreter decided to free it, and probably arenas are big enough to trigger the "OS reclaiming" behaviour. Unfortunately `numpy` uses `malloc()` and `free()` directly, which means that the python interpreter does not have *any* control over that memory; only the library implementing `free()` could have control over it. Unfortunately I don't know of better tools that would allow to analyse this kind of situation. — Bakuriu, Aug 19 '13 at 11:19
@Stefan On linux your example fails. In python2 memory is not reclaimed(even if using `gc.collect()`), while on python3 the `del l` is enough to reclaim the memory. The behaviour seem to change both on OSes and on python versions(which is an other clue that in certain situations also how python manages his memory arenas is involved.) — Bakuriu, Aug 19 '13 at 11:38

Is freeing handled differently for small/large numpy arrays?

1 Answers1

Linked