0

I have a M x N matrix X and a 1 x N matrix Y. What I would like to do is replace any 0-entry in X with the appropriate value from Y based on its column.

So if

X = np.array([[0, 1, 2], [3, 0, 5]])

and

Y = np.array([10, 20, 30])

The desired end result would be [[10, 1, 2], [3, 20, 5]].

This can be done straightforwardly by generating a M x N matrix where every row is Y and then using filter arrays:

Y = np.ones((X.shape[0], 1)) * Y.reshape(1, -1)
X[X==0] = Y[X==0]

But could this be done using numpy's broadcasting functionality?

David R
  • 994
  • 1
  • 11
  • 27

2 Answers2

1

Sure. Instead of physically repeating Y, create a broadcasted view of Y with the shape of X, using numpy.broadcast_to:

expanded = numpy.broadcast_to(Y, X.shape)

mask = X==0
x[mask] = expanded[mask]
user2357112
  • 260,549
  • 28
  • 431
  • 505
1

Expand X to make it a bit more general:

In [306]: X = np.array([[0, 1, 2], [3, 0, 5],[0,1,0]])

where identifies the 0s; the 2nd array identifies the columns

In [307]: idx = np.where(X==0)
In [308]: idx
Out[308]: (array([0, 1, 2, 2]), array([0, 1, 0, 2]))


In [309]: Z = X.copy()
In [310]: Z[idx]
Out[310]: array([0, 0, 0, 0])       # flat list of where to put the values
In [311]: Y[idx[1]]
Out[311]: array([10, 20, 10, 30])   # matching list of values by column

In [312]: Z[idx] = Y[idx[1]]
In [313]: Z
Out[313]: 
array([[10,  1,  2],
       [ 3, 20,  5],
       [10,  1, 30]])

Not doing broadcasting, but reasonably clean numpy.


Times compared to broadcast_to approach

In [314]: %%timeit 
     ...: idx = np.where(X==0)
     ...: Z[idx] = Y[idx[1]]
     ...: 
9.28 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [315]: %%timeit
     ...: exp = np.broadcast_to(Y,X.shape)
     ...: mask=X==0
     ...: Z[mask] = exp[mask]
     ...: 
19.5 µs ± 513 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Faster, though the sample size is small.

Another way to make the expanded Y, is with repeat:

In [319]: %%timeit
     ...: exp = np.repeat(Y[None,:],3,0)
     ...: mask=X==0
     ...: Z[mask] = exp[mask]
     ...: 
10.8 µs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Whose time is close to my where. It turns out that broadcast_to is relatively slow:

In [321]: %%timeit
     ...: exp = np.broadcast_to(Y,X.shape)
     ...: 
10.5 µs ± 52.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [322]: %%timeit
     ...: exp = np.repeat(Y[None,:],3,0)
     ...: 
3.76 µs ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

We'd have to do more tests to see whether that is just due to a setup cost, or if the relative times still apply with much larger arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353