Un-broadcasting Numpy arrays

Question

In a large code base, I am using np.broadcast_to to broadcast arrays (just using simple examples here):

In [1]: x = np.array([1,2,3])

In [2]: y = np.broadcast_to(x, (2,1,3))

In [3]: y.shape
Out[3]: (2, 1, 3)

Elsewhere in the code, I use third-party functions that can operate in a vectorized way on Numpy arrays but that are not ufuncs. These functions don't understand broadcasting, which means that calling such a function on arrays like y is inefficient. Solutions such as Numpy's vectorize aren't good either because while they understand broadcasting, they introduce a for loop over the array elements which is then very inefficient.

Ideally, what I'd like to be able to do is to have a function, which we can call e.g. unbroadcast, that returns an array with a minimal shape that can be broadcasted back to the full size if needed. So e.g.:

In [4]: z = unbroadcast(y)

In [5]: z.shape
Out[5]: (1, 1, 3)

I can then run the third-party functions on z, then broadcast the result back to y.shape.

Is there a way to implement unbroadcast that relies on Numpy's public API? If not, are there any hacks that would produce the desired result?

What do you mean "minimal shape"? Won't the minimal shape in N dimensions returned by `unbroadcast` always be `(1, 1, ..., 1)` (or even `(1,)` )? — Alex Riley, Nov 28 '16 at 13:53
I mean the minimal shape that still contains all the required data to broadcast it back to the full array. So in the example above, ``z.shape`` is ``(1,1,3)`` not ``(1,1,1)``. — astrofrog, Nov 28 '16 at 13:56

score 4 · Answer 1 · answered Nov 28 '16 at 13:56

I have a possible solution, so will post it here (however if anyone has a better one, please feel free to reply too!). One solution is to check the strides argument of arrays, which will be 0 along broadcasted dimensions:

def unbroadcast(array):
    slices = []
    for i in range(array.ndim):
        if array.strides[i] == 0:
            slices.append(slice(0, 1))
        else:
            slices.append(slice(None))
    return array[slices]

This gives:

In [14]: unbroadcast(y).shape
Out[14]: (1, 1, 3)

Andras Deak -- Слава Україні · Accepted Answer · 2016-11-28T16:26:39.617

3

This is probably equivalent to your own solution, only a bit more built-in. It uses as_strided in numpy.lib.stride_tricks:

import numpy as np
from numpy.lib.stride_tricks import as_strided

x = np.arange(16).reshape(2,1,8,1)  # shape (2,1,8,1)
y = np.broadcast_to(x,(2,3,8,5))    # shape (2,3,8,5) broadcast

def unbroadcast(arr):
    #determine unbroadcast shape
    newshape = np.where(np.array(arr.strides) == 0,1,arr.shape) # [2,1,8,1], thanks to @Divakar
    return as_strided(arr,shape=newshape)    # strides are automatically set here

z = unbroadcast(x)
np.all(z==x)  # is True

Note that in my original answer I didn't define a function, and the resulting z array had (64,0,8,0) as strides, whereas the input has (64,64,8,8). In the current version the returned z array has identical strides to x, I guess passing and returning the array forces a creation of a copy. Anyway, we could always set the strides manually in as_strided to get identical arrays under all circumstances, but this doesn't seem necessary in the above setup.

edited Nov 28 '16 at 16:26

answered Nov 28 '16 at 14:12

Andras Deak -- Слава Україні

33,737
11
83
111

1

Or `np.where(np.array(y.strides) == 0,1,y.shape)` for the newshape? – Divakar Nov 28 '16 at 14:13
@Divakar right, I keep forgetting `np.where`:) Obviously that's the numpy-idiomatic version, thank you. – Andras Deak -- Слава Україні Nov 28 '16 at 14:14
For posterity, it might be nice to edit this to make the two key lines into an 'unbroadcast' function? – astrofrog Nov 28 '16 at 16:16
@astrofrog thanks, I was hoping that anybody trying to do this can turn it into a function;) Anyway, I edited my answer. Turns out that this results in an array that has the same strides as `x`. – Andras Deak -- Слава Україні Nov 28 '16 at 16:27

Un-broadcasting Numpy arrays

2 Answers2