Passing C++ vector to Numpy through Cython without copying and taking care of memory management automatically

Question

Dealing with processing large matrices (NxM with 1K <= N <= 20K & 10K <= M <= 200K), I often need to pass Numpy matrices to C++ through Cython to get the job done and this works as expected & without copying.

However, there are times when I need to initiate and preprocess a matrix in C++ and pass it to Numpy (Python 3.6). Let's assume the matrices are linearized (so the size is N*M and it's a 1D matrix - col/row major doesn't matter here). Following the information in here: exposing C-computed arrays in Python without data copies & modifying it for C++ compatibility, I'm able to pass C++ array.

The problem is if I want to use std vector instead of initiating array, I'd get Segmentation fault. For example, considering the following files:

fast.h

#include <iostream>
#include <vector>

using std::cout; using std::endl; using std::vector;
int* doit(int length);

fast.cpp

#include "fast.h"
int* doit(int length) {
    // Something really heavy
    cout << "C++: doing it fast " << endl; 

    vector<int> WhyNot;

    // Heavy stuff - like reading a big file and preprocessing it
    for(int i=0; i<length; ++i)
        WhyNot.push_back(i); // heavy stuff

    cout << "C++: did it really fast" << endl;
    return &WhyNot[0]; // or WhyNot.data()
}

faster.pyx

cimport numpy as np
import numpy as np
from libc.stdlib cimport free
from cpython cimport PyObject, Py_INCREF

np.import_array()

cdef extern from "fast.h":
    int* doit(int length)

cdef class ArrayWrapper:
    cdef void* data_ptr
    cdef int size

    cdef set_data(self, int size, void* data_ptr):
        self.data_ptr = data_ptr
        self.size = size

    def __array__(self):
        print ("Cython: __array__ called")
        cdef np.npy_intp shape[1]
        shape[0] = <np.npy_intp> self.size
        ndarray = np.PyArray_SimpleNewFromData(1, shape,
                                               np.NPY_INT, self.data_ptr)
        print ("Cython: __array__ done")
        return ndarray

    def __dealloc__(self):
        print("Cython: __dealloc__ called")
        free(<void*>self.data_ptr)
        print("Cython: __dealloc__ done")


def faster(length):
    print("Cython: calling C++ function to do it")
    cdef int *array = doit(length)
    print("Cython: back from C++")
    cdef np.ndarray ndarray
    array_wrapper = ArrayWrapper()
    array_wrapper.set_data(length, <void*> array)
    print("Ctyhon: array wrapper set")
    ndarray = np.array(array_wrapper, copy=False)
    ndarray.base = <PyObject*> array_wrapper
    Py_INCREF(array_wrapper)
    print("Cython: all done - returning")
    return ndarray

setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy 

ext_modules = [Extension(
    "faster", 
    ["faster.pyx", "fast.cpp"], 
    language='c++',
    extra_compile_args=["-std=c++11"],
    extra_link_args=["-std=c++11"]
)]

setup(
    cmdclass = {'build_ext': build_ext}, 
    ext_modules = ext_modules,
    include_dirs=[numpy.get_include()]
)

If you build this with

python setup.py build_ext --inplace

and run Python 3.6 interpreter, if you enter the following you'd get seg fault after a couple of tries.

>>> from faster import faster
>>> a = faster(1000000)
Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
>>> a = faster(1000000)
Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
Cython: __dealloc__ called
Segmentation fault (core dumped)

Couple of things to note:

If you use array instead of vector (in fast.cpp) this would work like a charm!
If you call faster(1000000) and put the result into something other than variable a this would work.

If you enter smaller number like faster(10) you'd get a more detailed info like:

Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
Cython: __dealloc__ called <--- Perhaps this happened too early or late?
*** Error in 'python': double free or corruption (fasttop): 0x0000000001365570 ***
======= Backtrace: =========
More info here ....

It's really puzzling that why this doesn't happen with arrays? No matter what!

I make use of vectors a lot and would love to be able to use them in these scenarios.

@AndyG when do you think this is happening? When the `doit` function gets called the second time? Wouldn't that initiate a new vector? or it is basically resizing the previously filled vector? if so, why? — NULL, Jul 16 '17 at 21:15
I didn't actually look at your code before, sorry. Your code has undefined behavior because it's returning a reference to a temporary. The vector goes out of scope after `doit`. You'd have the same issue with arrays (I'm assuming `std::array` here). That you get a segmentation fault at all you can count your blessings, because it could just silently give you garbage instead. With dynamically allocated arrays (C-style array) you wouldn't have this issue because the memory isn't being deallocated. I presume you can hand the memory off to Cython to own, then? Otherwise you'd have memory leaks — AndyG, Jul 16 '17 at 21:25
@AndyG I see, yeah that make sense. Dynamic arrays (C-style arrays) work just fine as I mentioned. Is there anyway that I can avoid copying vector to array for return? — NULL, Jul 16 '17 at 21:49
Not in a way that would be desirable. A C-Style array appears to be the best route to go here, just make sure that Cython properly manages the memory. — AndyG, Jul 16 '17 at 21:51
Just in case someone reads this - I see nothing wrong with implementing heap creation here: replacing `vector WhyNot;` with `vector *WhyNot = new vector`, using `WhyNot->pushback(i);` and returning `WhyNot->data();`? — pthibault, Apr 04 '20 at 22:37

DavidW · Accepted Answer · 2020-01-09T22:30:33.233

I think @FlorianWeimer's answer provides a decent solution (allocate a vector and pass that into your C++ function) but it should be possible to return a vector from doit and avoid copies by using the move constructor.

from libcpp.vector cimport vector

cdef extern from "<utility>" namespace "std" nogil:
  T move[T](T) # don't worry that this doesn't quite match the c++ signature

cdef extern from "fast.h":
    vector[int] doit(int length)

# define ArrayWrapper as holding in a vector
cdef class ArrayWrapper:
    cdef vector[int] vec
    cdef Py_ssize_t shape[1]
    cdef Py_ssize_t strides[1]

    # constructor and destructor are fairly unimportant now since
    # vec will be destroyed automatically.

    cdef set_data(self, vector[int]& data):
       self.vec = move(data)
       # @ead suggests `self.vec.swap(data)` instead
       # to avoid having to wrap move

    # now implement the buffer protocol for the class
    # which makes it generally useful to anything that expects an array
    def __getbuffer__(self, Py_buffer *buffer, int flags):
        # relevant documentation http://cython.readthedocs.io/en/latest/src/userguide/buffer.html#a-matrix-class
        cdef Py_ssize_t itemsize = sizeof(self.vec[0])

        self.shape[0] = self.vec.size()
        self.strides[0] = sizeof(int)
        buffer.buf = <char *>&(self.vec[0])
        buffer.format = 'i'
        buffer.internal = NULL
        buffer.itemsize = itemsize
        buffer.len = self.v.size() * itemsize   # product(shape) * itemsize
        buffer.ndim = 1
        buffer.obj = self
        buffer.readonly = 0
        buffer.shape = self.shape
        buffer.strides = self.strides
        buffer.suboffsets = NULL

You should then be able to use it as:

cdef vector[int] array = doit(length)
cdef ArrayWrapper w
w.set_data(array) # "array" itself is invalid from here on
numpy_array = np.asarray(w)

Edit: Cython isn't hugely good with C++ templates - it insists on writing std::move<vector<int>>(...) rather than std::move(...) then letting C++ deduce the types. This sometimes causes problems with std::move. If you're having issues with it then the best solution is usually to tell Cython about only the overloads you want:

 cdef extern from "<utility>" namespace "std" nogil:
    vector[int] move(vector[int])

Awesome! Just couple of questions. 1- where is a good doc or source about types of values buffer format get, like 'i', 'f', etc? 2- So numpy takes care of releasing memory from now on? — NULL, Jul 18 '17 at 13:48
1. The buffer formats largely match those used by the numpy array module https://docs.python.org/3/library/array.html. 2. Yes - in this scheme the `ArrayWrapper` holds the memory and numpy keeps a reference to that alive for as long as is needed, so you don't need to do anything yourself. — DavidW, Jul 18 '17 at 16:25
I tried to use your array wrapper and my compiler complaints that move 'move' is not a member of 'std'. One more question, what if we pass the vector[int] array directly to numpy_array? For example, numpy_array = np.asarray(array)? That's the way I am returning my c++ vectors in python class properties. Is it wrong? — GioR, Sep 22 '17 at 16:48
Move is a relatively recent addition to the c++ standard (c++11 I think) so if you have an older compiler it may not work. There may be an option you can pass to the compiler to turn it on though. If you pass the vector directly to numpy array it will be copied twice, first to a Python list then to the numpy array. It's not wrong, but if the vector is large then the copying will be inefficient. — DavidW, Sep 22 '17 at 17:11
@DavidW I have a similar problem. I will ask my own question for sure. I am trying to send a deque of tuples from C++ to Cython and I get a segmentation fault. I need to move my deque of tuples to a cython deque of tuples but I get a compiler error when I do that. Should I modify the move function to reflect my requirement ? — gansub, Aug 02 '19 at 09:25
@gansub It's hard to know without seeing your code. It's sometimes worth just telling Cython about a specific overload for `move` - for example `vector[int] move(vector[int])` - rather than trying to wrap the template move, just because Cython doesn't handle templates very well. — DavidW, Aug 02 '19 at 10:06
One also could use `std::vector<>.swap` instead of `std::move` (which probably works out of the box). — ead, Jan 09 '20 at 22:12

Florian Weimer · Answer 2 · 2017-07-17T08:13:29.700

When you return from doit, the WhyNot object goes out of scope, and the array elements are deallocated. This means that &WhyNot[0] is no longer a valid pointer. You need to store the WhyNot object somewhere else, probably in a place provided by the caller.

One way to do this is to split doit into three functions, doit_allocate which allocates the vector and returns a pointer to it, doit as before (but with an argument which receives a pointer to the preallocated vector, anddoit_free` which deallocates the vector.

Something like this:

vector<int> *
doit_allocate()
{
    return new vector<int>;
}

int *
doit(vector<int> *WhyNot, int length)
{
    // Something really heavy
    cout << "C++: doing it fast " << endl; 

    // Heavy stuff - like reading a big file and preprocessing it
    for(int i=0; i<length; ++i)
        WhyNot->push_back(i); // heavy stuff

    cout << "C++: did it really fast" << endl;
    return WhyNot->front();
}

void
doit_free(vector<int> *WhyNot)
{
    delete WhyNot;
}

Right. It'd be cool if you could elaborate on how to do this in Cython e.g. what modification to the code? I have never dealt with vectors in Cython. — NULL, Jul 16 '17 at 22:04
Thanks for the edit. I meant how do deal with vector to numpy in Cython. @DavidW response seems to address this. — NULL, Jul 18 '17 at 15:03

Passing C++ vector to Numpy through Cython without copying and taking care of memory management automatically

2 Answers2

Linked

Related