3

Let's say I want to implement Numpy's

x[:] += 1

in Cython. I could write

@cython.boundscheck(False)
@cython.wraparoundcheck(False)
def add1(np.ndarray[np.float32_t, ndim=1] x):
    cdef unsigned long i
    for i in range(len(x)):
        x[i] += 1

However, this only works with ndim = 1. I could use

add1(x.reshape(-1))

but this only works with contiguous x.

Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?

(Reimplementing this particular operation in Cython doesn't make sense, as the above Numpy's code should be as fast as it gets -- I'm just using this as a simple example)

UPDATE:

I benchmarked the proposed solutions:

@cython.boundscheck(False)
@cython.wraparound(False)
def add1_flat(np.ndarray x):
    cdef unsigned long i
    for i in range(x.size):
        x.flat[i] += 1

@cython.boundscheck(False)
@cython.wraparound(False)
def add1_nditer(np.ndarray x):
    it = np.nditer([x], op_flags=[['readwrite']])
    for i in it:
        i[...] += 1

The second function requires import numpy as np in addition to cimport. The results are:

a = np.zeros((1000, 1000))

b = a[100:-100, 100:-100]

%timeit b[:] += 1
1000 loops, best of 3: 1.31 ms per loop

%timeit add1_flat(b)
1 loops, best of 3: 316 ms per loop

%timeit add1_nditer(b)
1 loops, best of 3: 1.11 s per loop

So, they are 300 and 1000 times slower than Numpy.

UPDATE 2:

The add11 version uses a for loop inside of a for loop, and so doesn't iterate the array as if it were flat. However, it is as fast as Numpy in this case:

%timeit add1.add11(b)
1000 loops, best of 3: 1.39 ms per loop

On the other hand, add1_unravel, one of the proposed solutions, fails to modify the contents of b.

MWB
  • 11,740
  • 6
  • 46
  • 91
  • These timings are for the compiled cython code? – hpaulj Aug 21 '16 at 00:31
  • @hpaulj Yes, unless I screwed up somehow (I'm new to Cython). These definitions are in myfile.pyx; setup.py includes setup(ext_modules =cythonize("myfile.pyx")), and I call "python setup.py build_ext --inplace" before importing myfile into ipython. – MWB Aug 21 '16 at 02:26
  • @hpaulj there's no way to tell Cython the type of x's elements. If I specify it, Cython assumes ndim=1. Plus, I think that flat and nditer are Python constructs that Cython just calls via Python – MWB Aug 21 '16 at 02:36
  • There is [numpy.ravel](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html) that might be suitable – J.J. Hakala Aug 21 '16 at 06:57

3 Answers3

2

http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

Is a nice tutorial on using nditer. It ends with a cython version. nditer is meant to be the all purpose array(s) iterator in numpy c level code.

There are also good array examples on the Cython memoryview page:

http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html

The data buffer of an ndarray is a flat buffer. So regardless of the array's shape and strides, you can iterate over the whole buffer in a flat c pointer fashion. But things like nditer and memoryview take care of the element size details. So in c level coding it is actually easier to step through all the elements in a flat fashion than it is to step by rows - going by rows has to take into account the row stride.

This runs in Python, and I think it will translate nicely into cython (I don't have that setup on my machine at the moment):

import numpy as np

def add1(x):
   it = np.nditer([x], op_flags=[['readwrite']])
   for i in it:
       i[...] += 1
   return it.operands[0]

x = np.arange(10).reshape(2,5)
y = add1(x)
print(x)
print(y)

https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx is a sum-of-products script that I wrote a while back to simulate einsum.

The core of its w = sop(x,y) calculation is:

it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for xarr, yarr, warr in it:
    x = xarr
    y = yarr
    w = warr
    size = x.shape[0]
    for i in range(size):
       w[i] = w[i] + x[i] * y[i]
return it.operands[nop]

===================

copying ideas from the nditer.html document, I got a version of add1 that is only half the speed of the native numpy a+1. The naive nditer (above) isn't much faster in cython than in python. A lot of the speedup may be due to the external loop.

@cython.boundscheck(False)
def add11(arg):
   cdef np.ndarray[double] x
   cdef int size
   cdef double value
   it = np.nditer([arg],
        flags=['external_loop','buffered'], 
        op_flags=[['readwrite']])
   for xarr in it:
       x = xarr
       size = x.shape[0]
       for i in range(size):
           x[i] = x[i]+1.0
   return it.operands[0]

I also coded this nditer in python with a print of size, and found that it iterated on your b with 78 blocks of size 8192 - that's a buffer size, not some characteristic of b and its data layout.

In [15]: a = np.zeros((1000, 1000)) 
In [16]: b = a[100:-100, 100:-100]

In [17]: timeit add1.add11(b)
100 loops, best of 3: 4.48 ms per loop

In [18]: timeit b[:] += 1
100 loops, best of 3: 8.76 ms per loop

In [19]: timeit add1.add1(b)    # for the unbuffered nditer 
1 loop, best of 3: 3.1 s per loop

In [21]: timeit add1.add11(a)
100 loops, best of 3: 5.44 ms per loop
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks, I'll read thru the tutorials and see if they answer my question. I think the answer would be improved if it included an actual implementation of `x[:] += 1` (that is efficient, works with any number of dimensions and x that's not contiguous) – MWB Aug 20 '16 at 22:20
  • I've added a faster cython version. – hpaulj Aug 21 '16 at 04:32
  • It's a bit unsatisfying because the stated goal is to iterate `x` **as if it were flat**. Nested loops don't exactly do that, although perhaps your solution is about as good as it gets. – MWB Aug 21 '16 at 04:58
  • 1
    The only nesting here is determined by the `nditer` buffer size. Check my edits. – hpaulj Aug 21 '16 at 06:44
1

You could try using the flat attribute of the ndarray, which provides an iterator over the flattened array object. It always iterates in C-major ordering, with the last index varying the fastest. Something like:

for i in range(x.size):
    x.flat[i] += 1
bnaecker
  • 6,152
  • 1
  • 20
  • 33
  • Interesting, although I suspect pretty inefficient. I'll run some tests later. – MWB Aug 20 '16 at 21:12
  • @MaxB Seems like you're right about the inefficiency. A quick `%timeit` in IPython shows that it takes about 10% more time than just iterating over all the dimensions of the array. Also, I was incorrect about the ordering, the `flat` attribute always returns an iterator over the elements in C ordering. – bnaecker Aug 20 '16 at 21:18
  • Another option could be to create a reshaped view onto `x`'s memory and use the solution you proposed. E.g., `new_array = x.view().reshape(-1, 1)`. – bnaecker Aug 20 '16 at 21:26
  • AFAIK that doesn't work with non-contiguous arrays (I mention it in the question). Modifying the contents of a view, if it's not contiguous, does some kind of copy-on-write and the array it's viewing remains unchanged (I haven't tried it in Cython though) EDIT : actually, I suspect it copies at creation time – MWB Aug 20 '16 at 22:01
1

With numpy.ravel

numpy.ravel(a, order='C')

Return a contiguous flattened array.

A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.

@cython.boundscheck(False)
@cython.wraparound(False)
def add1_ravel(np.ndarray xs):
    cdef unsigned long i
    cdef double[::1] aview

    narray = xs.ravel()
    aview = narray

    for i in range(aview.shape[0]):
        aview[i] += 1

    # return xs if the memory is shared
    if not narray.flags['OWNDATA'] or np.may_share_memory(xs, narray):
        return xs

    # otherwise new array reshaped
    shape = tuple(xs.shape[i] for i in range(xs.ndim))
    return narray.reshape(shape)
J.J. Hakala
  • 6,136
  • 6
  • 27
  • 61