1

Say you have the following N-dimensional array

>>> import numpy as np
>>> Z = np.array(7*[6*[5*[4*[3*[range(2)]]]]])
>>> Z.ndim
6

Note that N = 6, but I want to keep it arbitrary in the discussion.


Then I perform multiple-axis operations which -- of course subjectively -- problematically "collapse" (as described here and there) dimensions along which those are computed.

Say that the axes of computation are

>>> axes = (0,2,5)

Thus the tuple's length belongs to [1,N].


As you might have guessed, I want to make the shape of the output of a, say, np.mean be the same as that of its input. E.g.
>>> Y = np.mean(Z, axis=axes)
>>> Y.shape
(6L, 4L, 3L)

while

>>> Z.shape
(7L, 6L, 5L, 4L, 3L, 2L)

I have a homemade solution, as follows

def nd_repeat(arr, der, axes):
    if not isinstance(axes, tuple):
        axes = (axes,)
    shape = list(arr.shape)
    for axis in axes:
        shape[axis] = 1
    return np.zeros_like(arr) + der.reshape(shape) 

where incidentally der stands for "derived".

>>> nd_repeat(Z, Y, axes).shape
(7L, 6L, 5L, 4L, 3L, 2L)

What is the numpy-builtin manner to accomplish this N-dimensional repeat?


Performance concerns, import timeit
homemade_s = """\
nd_repeat(Z, np.nanpercentile(Z, 99.9, axis=axes), axes)
"""
homemade_t = timeit.Timer(homemade_s, "from __main__ import nd_repeat,Z,axes,np").timeit(10000)

npbuiltin_s = """\
np.broadcast_to(np.nanpercentile(Z, 99.9, axis=axes, keepdims=True), Z.shape)
"""
npbuiltin_t = timeit.Timer(npbuiltin_s, "from __main__ import Z,axes,np").timeit(10000)

As can be expected

>>> np.log(homemade_t/npbuiltin_t)
0.024082885343423521

my solution is ~2.5% slower than hpaulj's.

keepAlive
  • 6,369
  • 5
  • 24
  • 39

1 Answers1

0

mean has a keepdims parameter the retains the compressed dimension:

In [139]: shape=(2,3,4,5)
In [140]: x=np.arange(np.prod(shape)).reshape(shape)
In [141]: m=x.mean(axis=2, keepdims=True)
In [142]: m.shape
Out[142]: (2, 3, 1, 5)

It's easy now to replicate m along that dimension:

In [144]: m1=np.broadcast_to(m,shape)
In [145]: m1.shape
Out[145]: (2, 3, 4, 5)

repeat and tile are also handy means of replicating an array along a dimension.

broadcast_to expands the array without adding elements, just by changing shape and strides:

In [146]: m1.strides
Out[146]: (120, 40, 0, 8)

repeat increases the size of the array:

In [148]: m2=np.repeat(m, shape[2], axis=2)
In [149]: m2.shape
Out[149]: (2, 3, 4, 5)
In [150]: m2.strides
Out[150]: (480, 160, 40, 8)

m could be used without either, as in x-m. m broadcasts with x here.

hpaulj
  • 221,503
  • 14
  • 230
  • 353