1

I'm new to C extensions for NumPy and I'm wondering if the following workflow is possible.

  1. Pre-allocate an array in NumPy
  2. Pass this array to a C extension
  3. Modify array data in-place in C
  4. Use the updated array in Python with standard NumPy functions

In particular, I'd like to do this while ensuring I'm making zero new copies of the data at any step.

I'm familiar with boilerplate on the C side such as PyModuleDef, PyMethodDef, and the PyObject* arguments but a lot of examples I've seen involve coercion to C arrays which to my understanding involves copying and/or casting. I'm also aware of Cython though I don't know if it does similar coercions or copies under the hood. I'm specifically interested in simple indexed get- and set- operations on ndarray with numeric (eg. int32) values.

Could someone provide a minimal working example of creating a NumPy array, modifying it in-place in a C extension, and using the results in Python subsequently?

  • 1
    What kinds of modification do you have in mind? How familiar are you with the `numpy` data model? It's use of `shape`, `strides` and `dtype` to access elements in the `data-buffer`? – hpaulj Dec 21 '21 at 06:35
  • cython doesn't create new copies of numpy arrays (unless you specifically create them yourself), see [Working with NumPy](https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html) – Ahmed AEK Dec 21 '21 at 06:42
  • @hpaulj not very; would [Array API](https://numpy.org/doc/stable/reference/c-api/array.html) be a good place to start? – kiteflyer96 Dec 21 '21 at 15:15
  • @AhmedAEK is cython preferred to raw C extension in general? – kiteflyer96 Dec 21 '21 at 15:18

1 Answers1

1

Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy

choosing between writing raw C module and using cython depends on the purpose of the module written. if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).

however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).

Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • Thanks, I ended up going with [scikit-build](https://scikit-build.readthedocs.io/en/latest/index.html) and a C++ extension. – kiteflyer96 Dec 22 '21 at 04:45