0

I'm using Cython to wrap a C++ library. In the C++ code there is some data that represents a list of 3D vectors. It is stored in the object std::vector< std::array<double, 3> >. My current method to convert this into a python object is to loop over the vector and use the method arrayd3ToNumpy in the answer to my previous question on each element. This is quite slow, however, when the vector is very large. I'm not as worried about making copies of the data (since I believe the auto casting of vector to list creates a copy), but more about the speed of the casting.

Eney
  • 55
  • 6

1 Answers1

0

This this case the data is a simple C type, continguous in memory. I'd therefore expose it to Python via a wrapper cdef class that has the buffer protocol.

There's a guide (with an example) in the Cython documentation. In your case you don't need to store ncols - it's just 3 defined by the array type.

The key advantage is that you can largely eliminate copying. You could use your type directly with Cython's typed memoryviews to quickly access the underlying data. You could also do np.asarray to create a Numpy array that just wraps the underlying data that already exists.

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • Thanks for the suggestion. In the Cython docs example, the matrix is vectorized into a 1D vector. This makes sense to me, and I get how it is indexed using shape and stride. Do I need to flatten/vectorize `std::vector>` into a 1D vector? I tried without doing this, and modifying the example to, e.g., `self.shape[0] = self.v.size()` and `self.shape[1]=3`, and accessing elements with 2 indices, e.g., `buffer.buf = &(self.v[0][0])` but I'm getting segfaults doing this. – Eney Sep 13 '22 at 22:13
  • I'd probably do `self.shape[0] = self.v.size()`, `self.shape[1] = 3`, `buffer.buf = v.data().data()`. I suspect the segfaults are most likely to be the stride is wrong, but that's a guess. There should be no need to flatten it. – DavidW Sep 13 '22 at 22:23
  • Found the issue. I kept the `__cinit__()` from the linked example, just deleting the input argument and have a line `self.ncols=3`. My code only had a declaration + call to a method that moved the data into `self.v`, similar to your example [here](https://stackoverflow.com/a/45150611/9518875) . I needed to instantiate an instance in between. This works very well. Would you suggest using a wrapper+buffer protocol for all vectors vs. using cythons auto coercion of vector to list? Would you have to create a new (almost identical) wrapper for every `vector< type >`? – Eney Sep 14 '22 at 14:04
  • vector+buffer protocol works (and I recommend it) when you have a clear memory layout. It'd be ideal for a vector of double/int, or a vector of std::array or similar. It wouldn't work for a vector of vectors, or a vector of some arbitrary C++ class. So it's useful sometimes but not always. When it works it's probably better than auto-coercion. You would have to create a new wrapper - Cython doesn't have "template classes". I'd probably do this by string substitution (i.e. have some simple program generate the source file) – DavidW Sep 14 '22 at 17:02