Get mean of 2D slice of a 3D array in numpy

Question

I have a numpy array with a shape of:

(11L, 5L, 5L)

I want to calculate the mean over the 25 elements of each 'slice' of the array [0, :, :], [1, :, :] etc, returning 11 values.

It seems silly, but I can't work out how to do this. I've thought the mean(axis=x) function would do this, but I've tried all possible combinations of axis and none of them give me the result I want.

I can obviously do this using a for loop and slicing, but surely there is a better way?

J. Martinot-Lagarde · Accepted Answer · 2013-08-21T15:10:02.787

28

Use a tuple for axis :

>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> a.mean(axis=(1,2))
array([  12.,   37.,   62.,   87.,  112.,  137.,  162.,  187.,  212.,
        237.,  262.])

Edit: This works only with numpy version 1.7+.

edited Aug 21 '13 at 15:10

answered Aug 21 '13 at 12:39

J. Martinot-Lagarde

3,280
2
15
17

2

It works? One would think so for 1.7 and afterwards, but the docs still say only one axis. – Jaime Aug 21 '13 at 13:17
1

Didn't thought about the numpy version, I have 1.7.1 and it works. It's not in the documentation but the changelog is talking about ufuncs: http://www.softpedia.com/progChangelog/Numpy-Changelog-103892.html – J. Martinot-Lagarde Aug 21 '13 at 15:09
2

Cool, didn't know this had been added ! – lmjohns3 Aug 21 '13 at 16:41

Saullo G. P. Castro · Answer 2 · 2013-08-21T12:21:20.813

8

You can reshape(11, 25) and then call mean only once (faster):

a.reshape(11, 25).mean(axis=1)

Alternatively, you can call np.mean twice (about 2X slower on my computer):

a.mean(axis=2).mean(axis=1)

edited Aug 21 '13 at 12:21

answered Aug 21 '13 at 12:14

Saullo G. P. Castro

56,802
26
179
234

1

I think this is the most straightforward answer, though the einsum does seem to be faster. – lmjohns3 Aug 21 '13 at 16:43

Daniel · Answer 3 · 2013-08-21T17:45:57.183

5

Can always use np.einsum:

>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
array([ 12,  37,  62,  87, 112, 137, 162, 187, 212, 237, 262])

Works on higher dimensional arrays (all of these methods would if the axis labels are changed):

>>> a = np.arange(10*11*5*5).reshape(10,11,5,5)
>>> (np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])).shape
(10, 11)

Faster to boot:

a = np.arange(11*5*5).reshape(11,5,5)

%timeit a.reshape(11, 25).mean(axis=1)
10000 loops, best of 3: 21.4 us per loop

%timeit a.mean(axis=(1,2))
10000 loops, best of 3: 19.4 us per loop

%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
100000 loops, best of 3: 8.26 us per loop

Scales slightly better then the other methods as array size increases.

Using dtype=np.float64 does not change the above timings appreciably, so just to double check:

a = np.arange(110*50*50,dtype=np.float64).reshape(110,50,50)

%timeit a.reshape(110,2500).mean(axis=1)
1000 loops, best of 3: 307 us per loop

%timeit a.mean(axis=(1,2))
1000 loops, best of 3: 308 us per loop

%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
10000 loops, best of 3: 145 us per loop

Also something that is interesting:

%timeit np.sum(a) #37812362500.0
100000 loops, best of 3: 293 us per loop

%timeit np.einsum('ijk->',a) #37812362500.0
100000 loops, best of 3: 144 us per loop

edited Aug 21 '13 at 17:45

answered Aug 21 '13 at 13:14

Daniel

19,179
7
60
74

1

I think the speed is coming from your call to `np.einsum` using an `int` accumulator, instead of the `float` or `double`, not sure, that `np.mean` uses. That is a risky thing to do with computing statistics, as you can overflow the accumulator and get very wrong results. Giving `np.einsum` a `dtype=np.float` or `dtype=np.double` would both make the calculation more robust, and (I am guessing here) more similar in performance to the standard functions. But `np.einsum` is still a super cool function, so you get your +1... – Jaime Aug 21 '13 at 17:13
@Jamie. That was my thought as well, but in my intial testing showed that `einsum` was actually faster for any size and dtype. I have updated the post with `np.double` timings. – Daniel Aug 21 '13 at 17:20
@Ophion... it is weird that `sum()` does not give the same speed that `einsum()`... very well observed... actually the second faster method to calculate the mean would be: `timeit a.sum(axis=(1,2))/a.shape[-1]/a.shape[-2]` – Saullo G. P. Castro Aug 21 '13 at 18:26
1

@Ophion I think you should post a question like "why is `np.einsum()` faster than `np.sum()`?" to open this topic for discussion in more detail... – Saullo G. P. Castro Aug 21 '13 at 18:27
1

@SaulloCastro I was writing a question to that effect just now. Using `a.sum(axis=(1,2)...` is equivalent in time to the `a.mean(axis=(1,2))` function. – Daniel Aug 21 '13 at 18:35
@Ophion I just saw it! Great question! – Saullo G. P. Castro Aug 21 '13 at 18:37

Get mean of 2D slice of a 3D array in numpy

3 Answers3

Linked