Select multiple rows multiple times from a numpy array at once avoiding loops

Question

I am looking for a way to select multiple rows from a numpy array multiple times given an array of indexes.

Given M and indexes, I would like to get N avoiding for loop, since it is slow for big dimensions.

import numpy as np
M = np.array([[1, 0, 1, 1, 0],
              [1, 1, 1, 1, 0],
              [0, 0, 0, 1, 1],
              [1, 0, 0, 1, 1]])
indexes = np.array([[True, False, False, True],
                    [False, True, True, True],
                    [False, False, True, False],
                    [True, True, False, True]])
N = [M[index] for index in indexes]


Out: 
[array([[1, 0, 1, 1, 0],
        [1, 0, 0, 1, 1]]),
 array([[1, 1, 1, 1, 0],
        [0, 0, 0, 1, 1],
        [1, 0, 0, 1, 1]]),
 array([[0, 0, 0, 1, 1]]),
 array([[1, 0, 1, 1, 0],
        [1, 1, 1, 1, 0],
        [1, 0, 0, 1, 1]])]

The fact that you get a list of arrays that differ in shape strongly suggests that this list comprehension is the best you can do. — hpaulj, Oct 29 '20 at 22:05
Numpy is usually at its best when handling homogeneous data while your expected output is not. Loop seems like the best choice here. — Quang Hoang, Oct 29 '20 at 22:06
@hpaul is list comprehension really better than `np.split` here? — mathfux, Oct 29 '20 at 22:37

score 1 · Answer 1 · answered Oct 29 '20 at 22:36

1

We can use advantage that output data is homogenous in at least one dimension.

x, y = np.where(indexes)
split_idx = np.flatnonzero(np.diff(x))+1
output = np.split(M[y], split_idx)

Sample run:

>>> x
array([0, 0, 1, 1, 1, 2, 3, 3, 3], dtype=int32)
>>> y
array([0, 3, 1, 2, 3, 2, 0, 1, 3], dtype=int32)
>>> split_idx
array([2, 5, 6], dtype=int32)

answered Oct 29 '20 at 22:36

mathfux

5,759
1
14
34

For this small example, the straightforward list comprehension is faster. But the alternatives may scale differently. `split` has to loop as well, taking multiple slices. Scaling may depend on the number of rows versus columns. – hpaulj Oct 29 '20 at 23:43

fountainhead · Answer 2 · 2020-10-30T13:15:17.807

A slightly different approach, that uses broadcasting, and a different way of identifying the split points:

b_shape = (indexes.shape[0],) + M.shape  # New shape for broadcasted M. Here, (4,4,5)
M_b = np.broadcast_to(M, b_shape)        # Broadcasted M with the new shape.
                                         # (it uses views instead of replicating data)
r,c = np.nonzero(indexes)
result_joined = M_b[r,c,:]                             # The stack of all the selected rows from M
split_points = np.cumsum(np.sum(indexes, axis=1))[:-1] # Identify where to split.
result_split = np.split (result_merged, split_points)  # Final result, obtained by splitting.

Output:

[array([[1, 0, 1, 1, 0],
       [1, 0, 0, 1, 1]]),
array([[1, 1, 1, 1, 0],
       [0, 0, 0, 1, 1],
       [1, 0, 0, 1, 1]]),
array([[0, 0, 0, 1, 1]]),
array([[1, 0, 1, 1, 0],
       [1, 1, 1, 1, 0],
       [1, 0, 0, 1, 1]])]

print (result_split)

Select multiple rows multiple times from a numpy array at once avoiding loops

2 Answers2