1

I have a large (numpy ndarray) object whose memory I want to free as soon as I am finished with it in my Python extension (or I will quickly run out of memory). How can I safely do this?

It seems that in Python, something like del arr followed by gc.collect() would work, but I don't want to wait until my function returns. I have also considered ((PyObject*)arr)->ob_type->tp_dealloc((PyObject*)arr); Py_DECREF(arr) to call the destructor directly, but this seems likely to lead to a segmentation fault further down the line, since I am not sure if I own the reference (I got it from the tuple passed to my function as PyObject* args).

Another option might be to free the underlying C array with free, replace the ndarray's data pointer with a different one, and adjust the shape and other ndarray members accordingly. This feels hack-ish, but I am looking through the numpy C API to see if this is really possible.

Any tips would be greatly appreciated. Thanks in advance.

Zach Boyd
  • 419
  • 1
  • 5
  • 23
  • If you're worried about when memory will be freed after a `del` command in Python and this is critical then you probably should be writing this in C (just my humble opinion). – Bill Dec 28 '18 at 22:49
  • Perhaps I was unclear--I am writing in C. The reference to the Python commands was supposed to help clarify what I wanted to do in C. – Zach Boyd Dec 28 '18 at 22:50
  • Ah I see now. Sorry I can't help. – Bill Dec 28 '18 at 22:52
  • 1
    "seems that in Python, something like del arr followed by gc.collect() would work," CPython uses reference counting, so these will not work unless `del` removes the last reference. The `gc` module only works with the cyclic garbage detector, likely irrelevant in your case – juanpa.arrivillaga Dec 28 '18 at 23:12
  • Good point. Given the reference counting, it seems like directly modifying the underlying (C) data array is the only viable option. – Zach Boyd Dec 28 '18 at 23:18
  • 1
    If the array is being passed into the function that sounds unsafe to me – juanpa.arrivillaga Dec 28 '18 at 23:39
  • Avoiding breaking the invariants of the object (i.e. the dimensions match the data) is an obvious issue. Are there others I should be aware of? This is just prototype code, so a certain level of hackery may be ok. – Zach Boyd Dec 29 '18 at 00:09
  • 1
    You can make your interface trade in wrappers for the actual array objects—if Python clients never have any references to the arrays, you can replace them in the wrapper (perhaps with `None`). – Davis Herring Dec 29 '18 at 00:14
  • That's an interesting idea. So you are suggesting that I pass in a wrapper class with the ndarray as its only member, then just set that member to None when I am done with it? After that, will it be possible to guarantee that the garbage collector will run soon? I will need that memory only a few lines later within the function. Also, nice to see you again, Davis! – Zach Boyd Dec 29 '18 at 00:19
  • 1
    @ZachBoyd: Hi! If the wrapper has the only reference to the array, CPython will destroy it the moment it’s `DECREF`ed (down to 0). It’s even possible to hide the reference from normal Python code—or to check its reference count (and fail if it’s greater than 1) when the memory-intensive operation starts. – Davis Herring Dec 29 '18 at 02:12
  • 1
    _but I don't want to wait until my function returns_ why not create a smaller function that returns the moment you want the gc to act? Hard to give concrete stuff without the code. – Gnik Dec 29 '18 at 04:34
  • @DavisHerring Ah of course. Still figuring out Python extensions, but in retrospect that would obviously work. Thanks! – Zach Boyd Dec 29 '18 at 15:58
  • @ZachBoyd: Do you want that as an answer? I didn’t want to assume outright that such a type/usage restriction was acceptable. – Davis Herring Dec 29 '18 at 16:09
  • Yes, go ahead and post it. Changing the input type is ok in this case. – Zach Boyd Dec 29 '18 at 16:23

0 Answers0