10

I'm trying to understand how to work with nditer to do a reduction, in my case converting a 3d array into a 2d array.

I followed the help here http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html and managed to create a function that applies reduction over the last axis of the input. With this function

def nditer_sum(data, red_axes):
    it = numpy.nditer([data, None],
            flags=['reduce_ok', 'external_loop'],
            op_flags=[['readonly'], ['readwrite', 'allocate']],
            op_axes=[None, red_axes])
    it.operands[1][...] = 0

    for x, y in it:
        y[...] = x.sum()

    return it.operands[1]

I can get something equivalent to data.sum(axis=2)

>>> data = numpy.arange(2*3*4).reshape((2,3,4))
>>> nditer_sum(data, [0, 1, -1])
[[ 6 22 38]
[54 70 86]]
>>> data.sum(axis=2)
[[ 6 22 38]
[54 70 86]]

So to get something equivalent to data.sum(axis=0) I though that it was enough to change the argument red_axes to [-1, 0,1] But the result is quite different.

>>> data = numpy.arange(2*3*4).reshape((2,3,4))
>>> data.sum(axis=0)
[[12 14 16 18]
 [20 22 24 26]
 [28 30 32 34]]
>>> nditer_sum(data, [-1, 0, 1])
[[210 210 210 210]
 [210 210 210 210] 
 [210 210 210 210]]

In the for loop inside nditer_sum (for x,y in it:), the iterator is looping 2 times and giving an array of length 12 each time, instead of looping 12 times and giving an array of length 2 each time. I have read the numpy documentation several times and googled about this to no avail. I'm using numpy 1.6 and python 2.7

Sergio
  • 677
  • 1
  • 10
  • 20
  • -1 in op_axes is documented as "new axis", is this what you are trying to do? Also documentation feeds [[size x], [size y], [size z]] into op_axes, while you push [None, [size 3]], is that intended? – Dima Tisnek Oct 08 '12 at 19:50
  • The [documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html) says "an operand is a mapping from the dimensions of the iterator to the dimensions of the operand"... whatever that means. In the current example I have copied the code in the [iterating over arrays tutorial](http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html#reduction-iteration), that works, but only with the last axis. In the example, the 3d array has op_axes None (which seems equivalent to [-1, -1, -1]) and the 2d axis has [0, 1, -1] – Sergio Oct 17 '12 at 16:44
  • I though that changing [0,1,-1] to [-1, 0, 1] would do the reduction on the first axis, but it doesn't work. My question is how to do the reduction on an arbitrary axis. – Sergio Oct 17 '12 at 16:49
  • Send this very question to developers and get back here? I'd like to know too. – Dima Tisnek Oct 30 '12 at 10:19

2 Answers2

1

The axis=0 case works correctly if the nditer order is changed to F. Now there are 12 steps with arrays of size (2,) as you wanted.

it = np.nditer([data, None],
        flags=['reduce_ok', 'external_loop'],
        op_flags=[['readonly'], ['readwrite', 'allocate']],
        order='F',     # ADDED to loop starting with the last dimension
        op_axes=[None, red_axes])

But there isn't a solution like this for middle axis=1 case.


Another approach to iterating on selected dimensions is to construct a 'multi_index' iterator on a reduced dimensional array. I discovered in https://stackoverflow.com/a/25097271/901925 that np.ndindex uses this trick to perform a 'shallow iteration'.

For the axis=0 case, this function works:

def sum_1st(data):
    y = np.zeros(data.shape[1:], data.dtype)
    it = np.nditer(y, flags=['multi_index'])
    while not it.finished:
        xindex = tuple([slice(None)]+list(it.multi_index))
        y[it.multi_index] = data[xindex].sum()
        it.iternext()
    return y

Or generalized to any axis:

def sum_any(data, axis=0):
    yshape = list(data.shape)
    del yshape[axis]
    y = np.zeros(yshape, data.dtype)
    it = np.nditer(y, flags=['multi_index'])
    while not it.finished:
        xindex = list(it.multi_index)
        xindex.insert(axis, slice(None))
        y[it.multi_index] = data[xindex].sum()
        it.iternext()
    return y
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

Changing the line y[...] = x.sum() to y[...] += x fixes it (as in the example here).

Benjamin
  • 11,560
  • 13
  • 70
  • 119
  • But that's not what I want to do. I'm using a more complex function that does a bunch of operations on x. Imagine that instead of the sum is "the sum after clipping the 10% lower and higher of x" – Sergio Nov 13 '12 at 13:11
  • I'm just saying that the fix makes the behaviour consistent with numpy.sum with any axis specified, which seemed to be the question... Maybe you should add details to the question. – Benjamin Nov 13 '12 at 13:54