Let's say I want to implement Numpy's
x[:] += 1
in Cython. I could write
@cython.boundscheck(False)
@cython.wraparoundcheck(False)
def add1(np.ndarray[np.float32_t, ndim=1] x):
cdef unsigned long i
for i in range(len(x)):
x[i] += 1
However, this only works with ndim = 1
. I could use
add1(x.reshape(-1))
but this only works with contiguous x
.
Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?
(Reimplementing this particular operation in Cython doesn't make sense, as the above Numpy's code should be as fast as it gets -- I'm just using this as a simple example)
UPDATE:
I benchmarked the proposed solutions:
@cython.boundscheck(False)
@cython.wraparound(False)
def add1_flat(np.ndarray x):
cdef unsigned long i
for i in range(x.size):
x.flat[i] += 1
@cython.boundscheck(False)
@cython.wraparound(False)
def add1_nditer(np.ndarray x):
it = np.nditer([x], op_flags=[['readwrite']])
for i in it:
i[...] += 1
The second function requires import numpy as np
in addition to cimport
. The results are:
a = np.zeros((1000, 1000))
b = a[100:-100, 100:-100]
%timeit b[:] += 1
1000 loops, best of 3: 1.31 ms per loop
%timeit add1_flat(b)
1 loops, best of 3: 316 ms per loop
%timeit add1_nditer(b)
1 loops, best of 3: 1.11 s per loop
So, they are 300 and 1000 times slower than Numpy.
UPDATE 2:
The add11
version uses a for
loop inside of a for
loop, and so doesn't iterate the array as if it were flat. However, it is as fast as Numpy in this case:
%timeit add1.add11(b)
1000 loops, best of 3: 1.39 ms per loop
On the other hand, add1_unravel
, one of the proposed solutions, fails to modify the contents of b
.