30

I'm writing Python bindings for a C library that uses shared memory buffers to store its internal state. The allocation and freeing of these buffers is done outside of Python by the library itself, but I can indirectly control when this happens by calling wrapped constructor/destructor functions from within Python. I'd like to expose some of the buffers to Python so that I can read from them, and in some cases push values to them. Performance and memory use are important concerns, so I would like to avoid copying data wherever possible.

My current approach is to create a numpy array that provides a direct view onto a ctypes pointer:

import numpy as np
import ctypes as C

libc = C.CDLL('libc.so.6')

class MyWrapper(object):

    def __init__(self, n=10):
        # buffer allocated by external library
        addr = libc.malloc(C.sizeof(C.c_int) * n)
        self._cbuf = (C.c_int * n).from_address(addr)

    def __del__(self):
        # buffer freed by external library
        libc.free(C.addressof(self._cbuf))
        self._cbuf = None

    @property
    def buffer(self):
        return np.ctypeslib.as_array(self._cbuf)

As well as avoiding copies, this also means I can use numpy's indexing and assignment syntax and pass it directly to other numpy functions:

wrap = MyWrapper()
buf = wrap.buffer       # buf is now a writeable view of a C-allocated buffer

buf[:] = np.arange(10)  # this is pretty cool!
buf[::2] += 10

print(wrap.buffer)
# [10  1 12  3 14  5 16  7 18  9]

However, it's also inherently dangerous:

del wrap                # free the pointer

print(buf)              # this is bad!
# [1852404336 1969367156  538978662  538976288  538976288  538976288
#  1752440867 1763734377 1633820787       8548]

# buf[0] = 99           # uncomment this line if you <3 segfaults

To make this safer, I need to be able to check whether the underlying C pointer has been freed before I try to read/write to the array contents. I have a few thoughts on how to do this:

  • One way would be to generate a subclass of np.ndarray that holds a reference to the _cbuf attribute of MyWrapper, checks whether it is None before doing any reading/writing to its underlying memory, and raises an exception if this is the case.
  • I could easily generate multiple views onto the same buffer, e.g. by .view casting or slicing, so each of these would need to inherit the reference to _cbuf and the method that performs the check. I suspect that this could be achieved by overriding __array_finalize__, but I'm not sure exactly how.
  • The "pointer-checking" method would also need to be called before any operation that would read and/or write to the contents of the array. I don't know enough about numpy's internals to have an exhaustive list of methods to override.

How could I implement a subclass of np.ndarray that performs this check? Can anyone suggest a better approach?


Update: This class does most of what I want:

class SafeBufferView(np.ndarray):

    def __new__(cls, get_buffer, shape=None, dtype=None):
        obj = np.ctypeslib.as_array(get_buffer(), shape).view(cls)
        if dtype is not None:
            obj.dtype = dtype
        obj._get_buffer = get_buffer
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self._get_buffer = getattr(obj, "_get_buffer", None)

    def __array_prepare__(self, out_arr, context=None):
        if not self._get_buffer(): raise Exception("Dangling pointer!")
        return out_arr

    # this seems very heavy-handed - surely there must be a better way?
    def __getattribute__(self, name):
        if name not in ["__new__", "__array_finalize__", "__array_prepare__",
                        "__getattribute__", "_get_buffer"]:
            if not self._get_buffer(): raise Exception("Dangling pointer!")
        return super(np.ndarray, self).__getattribute__(name)

For example:

wrap = MyWrapper()
sb = SafeBufferView(lambda: wrap._cbuf)
sb[:] = np.arange(10)

print(repr(sb))
# SafeBufferView([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

print(repr(sb[::2]))
# SafeBufferView([0, 2, 4, 6, 8], dtype=int32)

sbv = sb.view(np.double)
print(repr(sbv))
# SafeBufferView([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313,
#          1.48539705e-313,   1.90979621e-313])

# we have to call the destructor method of `wrap` explicitly - `del wrap` won't
# do anything because `sb` and `sbv` both hold references to `wrap`
wrap.__del__()

print(sb)                # Exception: Dangling pointer!
print(sb + 1)            # Exception: Dangling pointer!
print(sbv)               # Exception: Dangling pointer!
print(np.sum(sb))        # Exception: Dangling pointer!
print(sb.dot(sb))        # Exception: Dangling pointer!

print(np.dot(sb, sb))    # oops...
# -70104698

print(np.extract(np.ones(10), sb))
# array([251019024,     32522, 498870232,     32522,         4,         5,
#               6,         7,        48,         0], dtype=int32)

# np.copyto(sb, np.ones(10, np.int32))    # don't try this at home, kids!

I'm sure there are other edge cases I've missed.


Update 2: I've had a play around with weakref.proxy, as suggested by @ivan_pozdeev. It's a nice idea, but unfortunately I can't see how it would work with numpy arrays. I could try to create a weakref to the numpy array returned by .buffer:

wrap = MyWrapper()
wr = weakref.proxy(wrap.buffer)
print(wr)
# ReferenceError: weakly-referenced object no longer exists
# <weakproxy at 0x7f6fe715efc8 to NoneType at 0x91a870>

I think the problem here is that the np.ndarray instance returned by wrap.buffer immediately goes out of scope. A workaround would be for the class to instantiate the array on initialization, hold a strong reference to it, and have the .buffer() getter return a weakref.proxy to the array:

class MyWrapper2(object):

    def __init__(self, n=10):
        # buffer allocated by external library
        addr = libc.malloc(C.sizeof(C.c_int) * n)
        self._cbuf = (C.c_int * n).from_address(addr)
        self._buffer = np.ctypeslib.as_array(self._cbuf)

    def __del__(self):
        # buffer freed by external library
        libc.free(C.addressof(self._cbuf))
        self._cbuf = None
        self._buffer = None

    @property
    def buffer(self):
        return weakref.proxy(self._buffer)

However, this breaks if I create a second view onto the same array whilst the buffer is still allocated:

wrap2 = MyWrapper2()
buf = wrap2.buffer
buf[:] = np.arange(10)

buf2 = buf[:]   # create a second view onto the contents of buf

print(repr(buf))
# <weakproxy at 0x7fec3e709b50 to numpy.ndarray at 0x210ac80>
print(repr(buf2))
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

wrap2.__del__()

print(buf2[:])  # this is bad
# [1291716568    32748 1291716568    32748        0        0        0
#         0       48        0] 

print(buf[:])   # WTF?!
# [34525664        0        0        0        0        0        0        0
#         0        0]  

This is seriously broken - after calling wrap2.__del__() not only can I read and write to buf2 which was a numpy array view onto wrap2._cbuf, but I can even read and write to buf, which should not be possible given that wrap2.__del__() sets wrap2._buffer to None.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 3
    Have you looked into writing a wrapper using Cython? It has a cleaner (and probably safer) interface for getting views of memory buffers via "typed memoryviews" – JoshAdel Jun 23 '16 at 11:16
  • @JoshAdel Would typed memoryviews really help in this case? Ultimately the problem is that the allocation and freeing of these buffers is being done outside of Python by an external library which I have no direct control over. The best I can do is keep track of whether they *ought* to still be allocated, based on whatever library functions I've called so far. I suppose I could do my bookkeeping in Cython instead of Python, but I can't yet see a compelling advantage in switching to Cython (there are some other reasons why this would be painful at this stage). – ali_m Jun 23 '16 at 11:35
  • If you keep a strong reference to some object that will call the deleter for you (e.g. `cffi` (which you should *always* use rather than `ctypes`) has builtin support for a deleter with the `gc` method), then you don't have to worry about invalidated weak references. – o11c Jun 24 '16 at 03:24
  • @o11c `gc` is irrelevant here, since the buffers are allocated and freed outside of Python by external the library I'm wrapping. – ali_m Jun 24 '16 at 07:43
  • @ali_m Ideally, you want the Python code to *control* the external library's actions. – o11c Jun 24 '16 at 18:43
  • @o11c I'm still not quite sure what you mean. I do have high-level control over the library, in that I can call ctypes-wrapped functions that will ultimately result in the buffers being allocated or freed. Although the actual allocation/freeing happens outside of Python, I can keep track of the state of the buffers from within Python. What I want is a way to ensure that it isn't possible to read from or write to these buffers after I have called a library function that deallocates them. I'm worried about dangling pointers rather than memory leaks. – ali_m Jun 25 '16 at 20:28
  • The "correct" way to do this is to make your `MyWrapper` the new array's `base`, so the array keeps the `MyWrapper` alive. Unfortunately, I don't think there's any way to do this from the Python side. It seems like a strange feature to be missing; I think it deserves a feature request. – user2357112 Jun 28 '16 at 20:43
  • (Calling `__del__` manually breaks things, but of course that would break things. Trying to protect against that is like trying to protect against someone just straight-up calling `free` on your pointer and not telling you.) – user2357112 Jun 28 '16 at 20:59
  • @user2357112 Why should calling `__del__()` "break things" in this particular case? If `weakref.proxy` behaves as I expect it to then assigning `_buffer = None` ought to result in a `ReferenceError` regardless of whether or not the pointer has been freed. Indeed, this is exactly what happens if I don't create a second view onto `buf[:]` before I call `wrap2.__del__()`. – ali_m Jun 28 '16 at 21:19
  • 1
    @ali_m: Assigning `_buffer = None` doesn't free `_buffer`, because the other array still has a reference to it. If you manually call a function that frees your pointer before your pointer is ready to be freed, stuff's going to break. – user2357112 Jun 28 '16 at 21:26
  • @user2357112 You're right, that was stupid of me. – ali_m Jun 28 '16 at 21:34
  • This is somewhat unrelated to your question, but it seems like it would be nice if your `MyWrapper` class is a [context manager](https://docs.python.org/2/reference/datamodel.html#context-managers) so that it can be used in a [with](https://docs.python.org/2/reference/compound_stmts.html#with) statement. – rkersh Jun 28 '16 at 22:07
  • This sounds like a rather dangerous solution. What's the use-case, exactly? Can you not just implement all the desired functionality in C, and use `ctypes` to hit your "C library" directly? – Aya Jun 30 '16 at 13:29
  • @Aya It's a proprietary library written by a third party and distributed as a binary. I could call the same library functions from C rather than Python, but that wouldn't help much since I still don't have any access to the code that actually allocates and frees the buffers. I can't, for example, allocate the buffers myself and then pass them to the library as pointers. – ali_m Jun 30 '16 at 15:29
  • Could you clarify your question up a bit, removing the now-irrelevant parts?. – ivan_pozdeev Jul 01 '16 at 04:18
  • @ali_m Unfortunately, I think however you do it, if you allow the possibility for the Python code to make the call which `free(3)`s the memory, AND expose the memory buffer directly to Python, there's always a chance that the Python code can do a free memory read/write, which would be bad. I presume disallowing the `free(3)`, and thus creating a potential memory leak, is not a reasonable option? – Aya Jul 02 '16 at 18:13

6 Answers6

8

You have to keep a reference to your Wrapper while any numpy array exists. Easiest way to achieve this, is to save this reference in a attribute of the ctype-buffer:

class MyWrapper(object):
    def __init__(self, n=10):
        # buffer allocated by external library
        self.size = n
        self.addr = libc.malloc(C.sizeof(C.c_int) * n)

    def __del__(self):
        # buffer freed by external library
        libc.free(self.addr)

    @property
    def buffer(self):
        buf = (C.c_int * self.size).from_address(self.addr)
        buf._wrapper = self
        return np.ctypeslib.as_array(buf)

This way you're wrapper is automatically freed, when the last reference, e.g the last numpy array, is garbage collected.

Daniel
  • 42,087
  • 4
  • 55
  • 81
4

It's a proprietary library written by a third party and distributed as a binary. I could call the same library functions from C rather than Python, but that wouldn't help much since I still don't have any access to the code that actually allocates and frees the buffers. I can't, for example, allocate the buffers myself and then pass them to the library as pointers.

You could, however, wrap the buffer in a Python extension type. That way you can expose only the interface you want to be available, and let the extension type automatically handle the freeing of the buffer. That way it's not possible for the Python API to do a free memory read/write.


mybuffer.c

#include <python3.3/Python.h>

// Hardcoded values
// N.B. Most of these are only needed for defining the view in the Python
// buffer protocol
static long external_buffer_size = 32;          // Size of buffer in bytes
static long external_buffer_shape[] = { 32 };   // Number of items for each dimension
static long external_buffer_strides[] = { 1 };  // Size of item for each dimension

//----------------------------------------------------------------------------
// Code to simulate the third-party library
//----------------------------------------------------------------------------

// Allocate a new buffer
static void* external_buffer_allocate()
{
    // Allocate the memory
    void* ptr = malloc(external_buffer_size);

    // Debug
    printf("external_buffer_allocate() = 0x%lx\n", (long) ptr);

    // Fill buffer with a recognizable pattern
    int i;
    for (i = 0; i < external_buffer_size; ++i)
    {
        *((char*) ptr + i) = i;
    }

    // Done
    return ptr;
}

// Free an existing buffer
static void external_buffer_free(void* ptr)
{
    // Debug
    printf("external_buffer_free(0x%lx)\n", (long) ptr);

    // Release the memory
    free(ptr);
}


//----------------------------------------------------------------------------
// Define a new Python instance object for the external buffer
// See: https://docs.python.org/3/extending/newtypes.html
//----------------------------------------------------------------------------

typedef struct
{
    // Python macro to include standard members, like reference count
    PyObject_HEAD

    // Base address of allocated memory
    void* ptr;
} BufferObject;


//----------------------------------------------------------------------------
// Define the instance methods for the new object
//----------------------------------------------------------------------------

// Called when there are no more references to the object
static void BufferObject_dealloc(BufferObject* self)
{
    external_buffer_free(self->ptr);
}

// Called when we want a new view of the buffer, using the buffer protocol
// See: https://docs.python.org/3/c-api/buffer.html
static int BufferObject_getbuffer(BufferObject *self, Py_buffer *view, int flags)
{
    // Set the view info
    view->obj = (PyObject*) self;
    view->buf = self->ptr;                      // Base pointer
    view->len = external_buffer_size;           // Length
    view->readonly = 0;
    view->itemsize = 1;
    view->format = "B";                         // unsigned byte
    view->ndim = 1;
    view->shape = external_buffer_shape;
    view->strides = external_buffer_strides;
    view->suboffsets = NULL;
    view->internal = NULL;

    // We need to increase the reference count of our buffer object here, but
    // Python will automatically decrease it when the view goes out of scope
    Py_INCREF(self);

    // Done
    return 0;
}

//----------------------------------------------------------------------------
// Define the struct required to implement the buffer protocol
//----------------------------------------------------------------------------

static PyBufferProcs BufferObject_as_buffer =
{
    // Create new view
    (getbufferproc) BufferObject_getbuffer,

    // Release an existing view
    (releasebufferproc) 0,
};


//----------------------------------------------------------------------------
// Define a new Python type object for the external buffer
//----------------------------------------------------------------------------

static PyTypeObject BufferType =
{
    PyVarObject_HEAD_INIT(NULL, 0)
    "external buffer",                  /* tp_name */
    sizeof(BufferObject),               /* tp_basicsize */
    0,                                  /* tp_itemsize */
    (destructor) BufferObject_dealloc,  /* tp_dealloc */
    0,                                  /* tp_print */
    0,                                  /* tp_getattr */
    0,                                  /* tp_setattr */
    0,                                  /* tp_reserved */
    0,                                  /* tp_repr */
    0,                                  /* tp_as_number */
    0,                                  /* tp_as_sequence */
    0,                                  /* tp_as_mapping */
    0,                                  /* tp_hash  */
    0,                                  /* tp_call */
    0,                                  /* tp_str */
    0,                                  /* tp_getattro */
    0,                                  /* tp_setattro */
    &BufferObject_as_buffer,            /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT,                 /* tp_flags */
    "External buffer",                  /* tp_doc */
    0,                                  /* tp_traverse */
    0,                                  /* tp_clear */
    0,                                  /* tp_richcompare */
    0,                                  /* tp_weaklistoffset */
    0,                                  /* tp_iter */
    0,                                  /* tp_iternext */
    0,                                  /* tp_methods */
    0,                                  /* tp_members */
    0,                                  /* tp_getset */
    0,                                  /* tp_base */
    0,                                  /* tp_dict */
    0,                                  /* tp_descr_get */
    0,                                  /* tp_descr_set */
    0,                                  /* tp_dictoffset */
    (initproc) 0,                       /* tp_init */
    0,                                  /* tp_alloc */
    0,                                  /* tp_new */
};


//----------------------------------------------------------------------------
// Define a Python function to put in the module which creates a new buffer
//----------------------------------------------------------------------------

static PyObject* mybuffer_create(PyObject *self, PyObject *args)
{
    BufferObject* buf = (BufferObject*)(&BufferType)->tp_alloc(&BufferType, 0);
    buf->ptr = external_buffer_allocate();
    return (PyObject*) buf;
}


//----------------------------------------------------------------------------
// Define the set of all methods which will be exposed in the module
//----------------------------------------------------------------------------

static PyMethodDef mybufferMethods[] =
{
    {"create", mybuffer_create, METH_VARARGS, "Create a buffer"},
    {NULL, NULL, 0, NULL}        /* Sentinel */
};


//----------------------------------------------------------------------------
// Define the module
//----------------------------------------------------------------------------

static PyModuleDef mybuffermodule = {
    PyModuleDef_HEAD_INIT,
    "mybuffer",
    "Example module that creates an extension type.",
    -1,
    mybufferMethods
    //NULL, NULL, NULL, NULL, NULL
};


//----------------------------------------------------------------------------
// Define the module's entry point
//----------------------------------------------------------------------------

PyMODINIT_FUNC PyInit_mybuffer(void)
{
    PyObject* m;

    if (PyType_Ready(&BufferType) < 0)
        return NULL;

    m = PyModule_Create(&mybuffermodule);
    if (m == NULL)
        return NULL;

    return m;
}

test.py

#!/usr/bin/env python3

import numpy as np
import mybuffer

def test():

    print('Create buffer')
    b = mybuffer.create()

    print('Print buffer')
    print(b)

    print('Create memoryview')
    m = memoryview(b)

    print('Print memoryview shape')
    print(m.shape)

    print('Print memoryview format')
    print(m.format)

    print('Create numpy array')
    a = np.asarray(b)

    print('Print numpy array')
    print(repr(a))

    print('Change every other byte in numpy')
    a[::2] += 10

    print('Print numpy array')
    print(repr(a))

    print('Change first byte in memory view')
    m[0] = 42

    print('Print numpy array')
    print(repr(a))

    print('Delete buffer')
    del b

    print('Delete memoryview')
    del m

    print('Delete numpy array - this is the last ref, so should free memory')
    del a

    print('Memory should be free before this line')

if __name__ == '__main__':
    test()

Example

$ gcc -fPIC -shared -o mybuffer.so mybuffer.c -lpython3.3m
$ ./test.py
Create buffer
external_buffer_allocate() = 0x290fae0
Print buffer
<external buffer object at 0x7f7231a2cc60>
Create memoryview
Print memoryview shape
(32,)
Print memoryview format
B
Create numpy array
Print numpy array
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], dtype=uint8)
Change every other byte in numpy
Print numpy array
array([10,  1, 12,  3, 14,  5, 16,  7, 18,  9, 20, 11, 22, 13, 24, 15, 26,
       17, 28, 19, 30, 21, 32, 23, 34, 25, 36, 27, 38, 29, 40, 31], dtype=uint8)
Change first byte in memory view
Print numpy array
array([42,  1, 12,  3, 14,  5, 16,  7, 18,  9, 20, 11, 22, 13, 24, 15, 26,
       17, 28, 19, 30, 21, 32, 23, 34, 25, 36, 27, 38, 29, 40, 31], dtype=uint8)
Delete buffer
Delete memoryview
Delete numpy array - this is the last ref, so should free memory
external_buffer_free(0x290fae0)
Memory should be free before this line
Aya
  • 39,884
  • 6
  • 55
  • 55
2

I liked @Vikas's approach, but when I tried it, I only got a Numpy object-array of a single FreeOnDel object. The following is much simpler and works:

class FreeOnDel(object):
    def __init__(self, data, shape, dtype, readonly=False):
        self.__array_interface__ = {"version": 3,
                                    "typestr": numpy.dtype(dtype).str,
                                    "data": (data, readonly),
                                    "shape": shape}
    def __del__(self):
        data = self.__array_interface__["data"][0]      # integer ptr
        print("do what you want with the data at {}".format(data))

view = numpy.array(FreeOnDel(ptr, shape, dtype), copy=False)

where ptr is a pointer to the data as an integer (e.g. ctypesptr.addressof(...)).

This __array_interface__ attribute is sufficient to tell Numpy how to cast a region of memory as an array, and then the FreeOnDel object becomes that array's base. When the array is deleted, the deletion is propagated to the FreeOnDel object, where you can call libc.free.

I might even call this FreeOnDel class "BufferOwner", because that's its role: to track ownership.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
1

weakref is a built-in mechanism for the functionality you are proposing. Specifically, weakref.proxy is an object with the same interface as the referred one. After the referenced object's disposal, any operation on the proxy raises weakref.ReferenceError. You don't even need numpy:

In [2]: buffer=(c.c_int*100)()   #acts as an example for an externally allocated buffer
In [3]: voidp=c.addressof(buffer)

In [10]: a=(c.c_int*100).from_address(voidp) # python object accessing the buffer.
                 # Here it's created from raw address value. It's better to use function
                 # prototypes instead for some type safety.
In [14]: ra=weakref.proxy(a)

In [15]: a[1]=1
In [16]: ra[1]
Out[16]: 1

In [17]: del a
In [18]: ra[1]
ReferenceError: weakly-referenced object no longer exists

In [20]: buffer[1]
Out[20]: 1

As you can see, in any case, you need a normal Python object over the C buffer. If an external library owns the memory, the object must be deleted before the buffer is freed on the C level. If you own the memory yourself, you just create a ctypes object the normal way, then it will be freed when it's deleted.

So, if your external library owns the memory and can free at any time (your specification is vague about this), it must tell you somehow it's about to do so - otherwise, you have no way to know about that to take necessary action.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Thanks for your suggestion. Unfortunately I don't think that `weakref.proxy` can properly deal with cases where multiple numpy arrays are backed by the same buffer (see my update). – ali_m Jun 28 '16 at 20:25
  • I've added some clarification to the question. Although the allocation/freeing of the buffers happens outside of Python, I can indirectly control when this happens by calling wrapped constructor/destructor functions. – ali_m Jun 28 '16 at 21:42
  • The problem is not with `weakref` but with `view`. Numpy objects that don't own its memory appear to be inherently unsafe. All you can do is hook view creation to likewise store a strong reference and return a weak reference. Or, like, _don't use views at all_ (as I showed, you don't even need `ndarray`s, `ctypes` arrays work just as well). – ivan_pozdeev Jun 28 '16 at 21:59
  • 1
    `ctypes` arrays would be fine if all I wanted was a container, but I actually want to perform vectorized computations using the contents of the buffers. I think that @Daniel's approach might be sufficient, since views and slices will hold references to the parent array from which they were derived via their `.base` attribute. As long as the initial array created from the pointer holds a reference to its parent object then that should be sufficient to ensure that the parent isn't deleted before all other views onto that memory are also gone. – ali_m Jun 28 '16 at 22:01
  • @ali_m but you told us that the buffer is deleted by the lib "from beneath" the object - so it has to have a way to actively signal everyone involved that it's become invalid, however many Python references there are to it. – ivan_pozdeev Jun 28 '16 at 22:07
  • As I said, I have high-level control over when the buffers are freed. Provided that all views of the buffer hold (possibly indirect) strong references to my top-level Python class, I can call "`lib.free_buffer()`" from within its `__del__()` method and trust that the C pointers are only freed once all of the arrays backed by those pointers have also been destroyed. – ali_m Jun 28 '16 at 22:11
  • @ali_m Then I don't see how views are unsafe. You just mustn't call `__del__()` directly but rely on reference counting to do that for you. In fact, in your example, calling `__del__()` didn't cause the object to be destroyed - that's the reason for unexpected results. – ivan_pozdeev Jul 01 '16 at 04:22
  • @ali_m If a buffer can never be freed while any Python references to the object exist, weakrefs are indeed not needed. They are for cases when one needs to dispose an object without regard to whether references exist or not. – ivan_pozdeev Jul 01 '16 at 04:22
1

You just need a wrapper with additional __del__ function before passing it to the numpy.ctypeslib.as_array method.

class FreeOnDel(object):
    def __init__(self, ctypes_ptr):
        # This is not needed if you are dealing with ctypes.POINTER() objects
        # Start of hack for ctypes ARRAY type;
        if not hasattr(ctypes_ptr, 'contents'):
            # For static ctypes arrays, the length and type are stored
            # in the type() rather than object. numpy queries these 
            # properties to find out the shape and type, hence needs to be 
            # copied. I wish type() properties could be automated by 
            # __getattr__ too
            type(self)._length_ = type(ctypes_ptr)._length_
            type(self)._type_ = type(ctypes_ptr)._type_
        # End of hack for ctypes ARRAY type;

        # cannot call self._ctypes_ptr = ctypes_ptr because of recursion
        super(FreeOnDel, self).__setattr__('_ctypes_ptr', ctypes_ptr)

    # numpy.ctypeslib.as_array function sets the __array_interface__
    # on type(ctypes_ptr) which is not called by __getattr__ wrapper
    # Hence this additional wrapper.
    @property
    def __array_interface__(self):
        return self._ctypes_ptr.__array_interface__

    @__array_interface__.setter
    def __array_interface__(self, value):
        self._ctypes_ptr.__array_interface__ = value

    # This is the onlly additional function we need rest all is overhead
    def __del__(self):
        addr = ctypes.addressof(self._ctypes_ptr)
        print("freeing address %x" % addr)
        libc.free(addr)
        # Need to be called on all object members
        # object.__del__(self) does not work
        del self._ctypes_ptr

    def __getattr__(self, attr):
        return getattr(self._ctypes_ptr, attr)

    def __setattr__(self, attr, val):
        setattr(self._ctypes_ptr, attr, val)

To test

In [32]: import ctypes as C

In [33]: n = 10

In [34]: libc = C.CDLL("libc.so.6")

In [35]: addr = libc.malloc(C.sizeof(C.c_int) * n)

In [36]: cbuf = (C.c_int * n).from_address(addr)

In [37]: wrap = FreeOnDel(cbuf)

In [38]: sb = np.ctypeslib.as_array(wrap, (10,))

In [39]: sb[:] = np.arange(10)

In [40]: print(repr(sb))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [41]: print(repr(sb[::2]))
array([0, 2, 4, 6, 8], dtype=int32)

In [42]: sbv = sb.view(np.double)

In [43]: print(repr(sbv))
array([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313,
         1.48539705e-313,   1.90979621e-313])

In [45]: buf2 = sb[:8]

In [46]: sb[::2] += 10

In [47]: del cbuf   # Memory not freed because this does not have __del__

In [48]: del wrap   # Memory not freed because sb, sbv, buf2 have references

In [49]: del sb     # Memory not freed because sbv, buf have references

In [50]: del buf2   # Memory not freed because sbv has reference

In [51]: del sbv    # Memory freed because no more references
freeing address 2bc6bc0

In fact a more easier solution is to overwrite __del__ function

In [7]: olddel = getattr(cbuf, '__del__', lambda: 0)

In [8]: cbuf.__del__ = lambda self : libc.free(C.addressof(self)), olddel

In [10]: import numpy as np

In [12]: sb = np.ctypeslib.as_array(cbuf, (10,))

In [13]: sb[:] = np.arange(10)

In [14]: print(repr(sb))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [15]: print(repr(sb))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [16]: print(repr(sb[::2]))
array([0, 2, 4, 6, 8], dtype=int32)

In [17]: sbv = sb.view(np.double)

In [18]: print(repr(sbv))
array([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313,
         1.48539705e-313,   1.90979621e-313])

In [19]: buf2 = sb[:8]

In [20]: sb[::2] += 10

In [22]: del cbuf   # Memory not freed

In [23]: del sb     # Memory not freed because sbv, buf have references

In [24]: del buf2   # Memory not freed because sbv has reference

In [25]: del sbv    # Memory freed because no more references
Vikas
  • 2,220
  • 1
  • 15
  • 12
0

If you can completely control the C buffer's lifetime from Python, what you essentially have is a Python "buffer" object that an ndarray should use.

Thus,

  • there are 2 fundamental ways to connect them:
    • buffer -> ndarray
    • ndarray -> buffer
  • there's also a question how to implement the buffer itself

buffer -> ndarray

Is unsafe: there's nothing automatically holding a reference to buffer for the lifetime of ndarray. Introducing a 3rd object to hold references to both isn't any better: then you just have to keep track of the 3rd object instead of the buffer.

ndarray -> buffer

"Now you're talking!" Since the very task at hand is "buffer that an ndarray should use"? this is the natural way to go.

In fact, numpy has a built-in mechanism: any ndarray that doesn't own its memory holds a reference to the object that does in its base attribute (thus preventing the latter from being garbage collected). For views, the attribute is automatically assigned accordingly (to the parent object if its base is None or to the parent's base).

The catch is you cannot just place any old object there. Instead, the attribute is filled by a constructor and the suggested object is first put through its scrunity.

So, if only we could construct some custom object that numpy.array accepts and considers eligible for memory reuse (numpy.ctypeslib.as_array is actually a wrapper for numpy.array(copy=False) with a few sanity checks)...

<...>

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152