11

I am writing a Python class that will wrap a C module containing a C struct. I am using the Cython language (a super-set language of Python and C). The C struct is malloc'd in the constructor and contains an array that I want to use in Python. The array will be represented in Python as a NumPy array but I don't want to copy the values to it. I want to link the NumPy array directly to the malloc'd memory. For this task I use the NumPy Array API and specifically this function:

PyObject*PyArray_SimpleNewFromData(int nd, npy_intp* dims, int typenum, void* data)

I managed to bind the NumPy array to the C struct's array using this code in Cython and it works well as long as the NumPy array and MultimediaParams object have the same lifetime:

cdef class MultimediaParams:
    def __init__(self, **kwargs):
        self._mm_np = < mm_np *> malloc(sizeof(mm_np))
        #some code...

    def as_ndarray(self): #TODO: what if self deallocated but numpy array still exists(segfault?)
        cdef numpy.npy_intp shape[1]
        cdef int arr_size = sizeof(self._mm_np[0].n2) / sizeof(self._mm_np[0].n2[0])
        shape[0] = < numpy.npy_intp > arr_size
        cdef numpy.ndarray ndarray
        ndarray = numpy.PyArray_SimpleNewFromData(1, shape, numpy.NPY_DOUBLE, self._mm_np[0].n2)

        return ndarray

    def __dealloc__(self):
        free(self._mm_np)

As you can see, the class has its __dealloc__ method which will take care of the memory allocated in C and free it when there are no references to MultimediaParams instance.

In this kind of binding NumPy is not owning the memory of the array.

The problem: when the MultimediaParams object is deallocated and the memory of the array is freed, the NumPy object is still pointing to memory that was just freed. This will cause a segfault when the NumPy object tries to access/modify the memory that was freed.

How can I make sure the MultimediaParams object is not deallocated as long as there is a NumPy object using its memory?

As I understand it, all I need to do is to make the NumPy object have a refference to a MultimediaParams instance from which it got the memory to point to. I tried to use ndarray.base = <PyObject*>self so NumPy will know its base object, this is supposed to add another reference to a MultimediaParams instance and will cause it not to be deallocated as long as the NumPy array is alive. This line causes my tests to fail because the contents of the NumPy array turn to garbage.

CLARIFICATION: The NumPy array does not take ownership of the C array memory and I don't want it to. I want MultimediaParams to be responsible for freeing the C struct (that contains the array data), but not to do it as long as the NumPy object is alive.

Any suggestions?

Max Segal
  • 1,955
  • 1
  • 24
  • 53
  • 1
    related to the title: [How to convert pointer to c array to python array](http://stackoverflow.com/q/7543675/4279) – jfs Nov 02 '15 at 13:20
  • 1
    @DavidW: it doesn't look like a duplicate. Forcing the memory ownership probably leads to double `free()` that is not good. OPs issue is that [`ndarray.base = self` doesn't work](http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_SetBaseObject) – jfs Nov 02 '15 at 15:37
  • 2
    what happens if you [`Py_INCREF(self)`](https://gist.github.com/GaelVaroquaux/1249305) after `.base` assignment? – jfs Nov 02 '15 at 15:46
  • @J.F.Sebastian I was thinking the problem OP really wanted to solve was to ensure that the memory has the same lifetime as the array (and was trying to solve it by tying it to an instance of `MultimediaParams`). But it's possible that they may have other reasons for linking the two instances - in which case I'm wrong and it isn't a duplicate... – DavidW Nov 02 '15 at 15:55
  • @DavidW: how `PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA)` would keep `MultimediaParams` instance alive? – jfs Nov 02 '15 at 15:59
  • @J.F.Sebastian It wouldn't - the data would be owned by the numpy array. But the `MultimediaParams` instance could keep a reference to the numpy array instead of the raw C array and all the lifetimes would be linked in the way OP wants. It is possible I have missed the point though (and if OP tells me that I have, I'll delete my answer) – DavidW Nov 02 '15 at 16:10
  • So, based on the edit I did misunderstand what was wanted - I've deleted my answer (and a few comments). I don't know how to achieve what you're after though. – DavidW Nov 03 '15 at 07:25

1 Answers1

2

As @J.F.Sebastian's comment points towards, the problem is most likely that while you correctly assign a pointer to your MultimediaParams instance to the base reference of the NumPy array, you don't actually increase it's reference count, because the assignment is made in C, not in Python. This probably leads to premature garbage collection of the MultimediaParams object, the memory of which is reused and causes what you experience as garbage data in the ndarray.

Manually incrementing the reference count of the MultimediaParams object using the macro Py_INCREF should yield the desired behavior.

Henrik
  • 4,254
  • 15
  • 28
  • Ok, but question: how Py_INCREF will know that specifically the ndarray object will be pointing to that MultimediaParams object? – Max Segal Nov 09 '15 at 10:29
  • 1
    Python keeps track of the number of references to an object, this is known as a reference count. It doesn't need to know which references. Whenever a reference to an object is created, the reference count of the object is incremented, and conversely decremented when a reference is removed. You create a reference by setting the base property of the ndarray, but this is done in C where Python doesn't increment the reference counter of the MultimediaParams object for you, so you need to do it manually. – Henrik Nov 09 '15 at 10:36