-1
  • I wanna write a C extension lib for Python, aiming to replace Python code with C.
  • and the Python codes has several lines like below:
import numpy as np
a = np.array([1,3,12,0.43,234,-3,-4])
b = a[[1,3,5]]
print(b)

# array([ 3.  ,  0.43, -3.  ])

  • Different from an int as the index of a numpy array, this example treats an array as an index.

  • I am confused that getting the designated indexes of a given numpy array, what is the corresponding C-API name?

  • the NUMPY C-API files is in numpy c-api

  • Thanks very much.

dongrixinyu
  • 172
  • 2
  • 14
  • 1
    @MechanicPig: No, that's a C API version of the `numpy.ndarray.item` method. It doesn't do what the question is asking for. – user2357112 Sep 30 '22 at 03:11
  • 1
    `PyObject_GetItem` will work, although there won't be a huge benefit to using the C API vs writing it in Python. I don't think there's a direct Numpy C API function available for every individual operation – DavidW Sep 30 '22 at 07:04
  • NUMPY C API is quite fast, but it only composed of a small fraction of my python code. there is still a large amount of pure python code to be rewritten in C for speeding up. – dongrixinyu Sep 30 '22 at 07:27

2 Answers2

2

High-level Numpy functions are not meant to be used from C. In fact, not all of them are implemented in C (some are implemented in pure-Python calling other high-level function themselves implemented in C). In fact, there is not much benefit from doing apart from reusing some code that would not be very efficient compared to what can be implemented in C directly.

Numpy provides a quite-minimal interface (the one you provided) so modules can operate on Numpy array. In practice, low-level C modules often extract a pointer with PyArray_GetPtr (or more specific macros like PyArray_GETPTR2, while checking flags so to be safe) and directly operate on the array buffer based on strides (extracted with PyArray_STRIDES). A Numpy array is just a big raw buffer with a fixed size and some meta-informations. Views add more information like the number of dimensions, the shape, the strides, etc. If possible, it is better to check the array is contiguous and write a code for contiguous arrays. Indeed, compilers tends not to generate a fast code when some strides are set to 1: it is your responsibility to optimize this in C (this is what Numpy does in its function but this part introduces some overheads). A code optimized to operate on contiguous array can be much faster (mainly due to the possible use of SIMD instruction and faster indexing instructions).

While you could use PyArray_GETITEM, the resulting code will be significantly slower than a direct access due to possible checks, an inefficient generic indexing (breaking many compiler optimizations) and also because the function will certainly not be inlined by the compiler. It might be faster than a pure-Python code, but not by a large margin (certainly similar to a Cython code not using a direct indexing).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • I find that the numpy expression in my question is equivalent to the numpy C func `PyArray_Choose`. You can try this `PyObject *PyArray_Choose(PyArrayObject *self, PyObject *op, PyArrayObject *ret, NPY_CLIPMODE clipmode)` and check it . Thank you for your answer – dongrixinyu Oct 04 '22 at 10:50
  • 1
    The thing is that this functions deals with objects so the benefit should be small compared to calling it from CPython (it will be still a bit slow on small arrays compared to native C operations and equally fast for big ones). If you want a short/clear code, then using CPython is better. If you want a fast code then doing native loops should be much faster. Calling such functions from C make your code less robust, more complex while not being significantly faster. This was the whole point of my answer. – Jérôme Richard Oct 04 '22 at 11:23
  • 1
    @dongrixinyu I believe `PyArray_Choose` is significantly slower than just doing the indexing with a list *because you're misusing the function*. – DavidW Oct 04 '22 at 20:27
0
  • Finally I find this numpy C func, of which the name is PyArray_Choose.
  • The complete func definition is PyObject *PyArray_Choose(PyArrayObject *self, PyObject *op, PyArrayObject *ret, NPY_CLIPMODE clipmode)
  • numpy choose
dongrixinyu
  • 172
  • 2
  • 14
  • 2
    I don't think `np.choose(b, a)` is really equivalent to `a[b]`. You've forced `b` to be a numpy array rather than any sequence. You're also interpreting the array `b` as a sequence of 0d arrays. It'll kind of work, but probably be pretty slow because you're using an array for the list argument and a list for the array argument – DavidW Oct 04 '22 at 20:22
  • 1
    I measure it as ~13 times slower than just indexing. So using this C API function is a really poor "optimization" – DavidW Oct 04 '22 at 20:26