Python 2.7: looping over 1D fibers in a multidimensional Numpy array

Question

I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.

In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.

A working solution is the one below:

import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
    print func(A[fiber_index])

However, I am wondering whether there is something that is:

More idiomatic
Faster

Hope you can help!

Is `np.sum` just an example? Your options are very different if your `func` is a built-in numpy ufunc, or you need to use a generic Python function. — Jaime, Jan 15 '15 at 23:59
Does `func` take a scalar, 1d vector or multi-dim array? `ndindex` uses the most basic numpy iterator, `nditer`. More at http://stackoverflow.com/a/25097271/901925 — hpaulj, Jan 16 '15 at 03:00

xnx · Accepted Answer · 2015-01-16T03:10:00.827

2

I think you might be looking for numpy.apply_along_axis

In [10]: def my_func(x):
   ...:     return x**2 + x

In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]: 
array([[[  0,   2,   6],
        [ 12,  20,  30],
        [ 42,  56,  72]],

       [[ 90, 110, 132],
        [156, 182, 210],
        [240, 272, 306]],

       [[342, 380, 420],
        [462, 506, 552],
        [600, 650, 702]]])

Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:

In [12]: np.sum(A, axis=2)
Out[12]: 
array([[ 3, 12, 21],
       [30, 39, 48],
       [57, 66, 75]])

edited Jan 16 '15 at 03:10

answered Jan 16 '15 at 00:40

xnx

24,509
11
70
109

This was exactly what I was looking for. I was actually already using it, but because of a bug I thought I had misunderstood its effect, so that is partly what sent me searching for a solution. – Bonnevie Jan 19 '15 at 20:37

score 2 · Answer 2 · edited May 23 '17 at 11:50

numpy provides a number of different ways of looping over 1 or more dimensions.

Your example:

func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
    print func(fiber_index)
    print A[fiber_index]

produces something like:

(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...

generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.

Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.

It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.

http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html Iterating Over Arrays has good explanation, including an example of doing so in cython.

Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:

np.sum(A, axis=-1)
np.sum(A, axis=(1,2))   # sum over 2 axes

An equivalent is

np.add.reduce(A, axis=-1)

np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.

xnx suggests

np.apply_along_axis(np.sum, 2, A)

It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:

outarr[(i,j)] = np.sum(A[(i, j, slice(None))])

Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.

Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.

def with_ndindex(A, func, ax=-1):
    # apply func along axis ax
    A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
    shape = A.shape[:-1]
    B = np.empty(shape,dtype=A.dtype)
    for ii in np.ndindex(shape):
        B[ii] = func(A[ii])
    return B

I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.

So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.

Thanks for the very detailed answer! A lot of extra information I am going to enjoy digging in to. — Bonnevie, Jan 19 '15 at 20:38

Python 2.7: looping over 1D fibers in a multidimensional Numpy array

2 Answers2