Index a different range of indicies from each row of numpy array

Question

I have two arrays of incidies with shape m. I need to take the mean of the values inbetween the indicies from an array with shape m x n. Can this be done without iterating through each row? What is the fastest way to do this?

idx0 = np.array([1, 3, 2, 5])
idx1 = np.array([5, 8, 6, 7])
d = np.array([[1,2,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9]])

Desired result:

<< d[np.arange(d.shape[0]), idx0:idx1]
>> np.array([3.5, 6, 4.5, 6.5])

The best ways I've found to do this are list comprehension or using a for loop with numba, but this seems like it should be a common problem? Thanks.

I don't understand what you mean by "mean of the values in between the indices". Can you explain how the desired results derive from the inputs? — Barmar, May 08 '23 at 21:04

score 1 · Accepted Answer · answered May 08 '23 at 21:17

1

In your case you'd still need to get slices of pairwised ranges of indices from idx0 and idx1.
zip + slice is a straigtforward way:

[d[:, slice(*a)].mean() for a in zip(idx0, idx1)]

[3.5, 6.0, 4.5, 6.5]

answered May 08 '23 at 21:17

RomanPerekhrest

88,541
4
65
105

The OP said they know how to do it with a list comprehension. I think they're looking for a way to do it directly in numpy. – Barmar May 08 '23 at 21:30
@Barmar, I don't think he would find a "without iterating" way in numpy for that, that why I've written "*you'd still need*" – RomanPerekhrest May 08 '23 at 21:33
There is a way to keep it all in numpy. I posted it now – Mad Physicist May 08 '23 at 21:58

score 1 · Answer 2 · answered May 08 '23 at 21:57

If you work with d.ravel() instead of d, things will be easier, because now you can find the sums of elements using np.add.reduceat and compute averages directly from that.

The linear indices in d.ravel() corresponding to idx0 and idx1 are

idx0r = idx0 + d.shape[1] * np.arange(d.shape[0])
idx1r = idx1 + d.shape[1] * np.arange(d.shape[0])

To obtain a linear index of ranges to sum, you can do

idx = (np.stack((idx0, idx1), axis=-1) + d.shape[1] * np.arange(d.shape[0])[:, None]).ravel()

The reason that you want to have a 1D array for this is that you need to be able to remove the last element if it is equal to d.size, since np.add.reduceat does not accept indices past the end of the array:

if idx[-1] == d.size:
    idx = idx[:-1]

This last operation does not copy the data: it just creates a view that is one element short.

You can now sum directly:

sums = np.add.reduceat(d.ravel(), idx)[::2]

Taking every other element is required because the sums between the n-th element of idx1 and n+1st element of idx0 are also present in the sums.

Now you can normalize by the lengths of each segment:

means = sums / (idx1 - idx0)

TL;DR

idx = (np.stack((idx0, idx1), axis=-1) + d.shape[1] * np.arange(d.shape[0])[:, None]).ravel()
if idx[-1] == d.size:
    idx = idx[:-1]
means = np.add.reduceat(d.ravel(), idx)[::2] / (idx1 - idx0)

score 1 · Answer 3 · answered May 08 '23 at 22:38

Another way you can do it is masking instead of indexing. First you create the mask:

mask = np.zeros((d.shape[0], d.shape[1] + 1), np.int8)
mask[np.arange(d.shape[0]), idx0] = 1
mask[np.arange(d.shape[0]), idx1] = -1
mask = mask.cumsum(axis=1)[:, :-1].astype(bool)

The initial array is created with one extra column to accommodate the case where idx1 contains elements that are at d.size[1].

There are a number of approaches you can take going forward.

The first is probably simplest: using a masked array:
```
means = np.ma.array(d, mask=~mask).mean(axis=1)
```
Another approach is to pre-multiply the data by the mask and compute your own sums:
```
means = (d * mask).sum(axis=1) / (idx1 - idx0)
```
An third approach would be to apply the mask and use np.add.reduceat similarly to how the other answer suggests, but on the masked data, which avoids adding unnecessary numbers:
```
n = (idx1 - idx0)
means = np.add.reduceat(d[mask], np.r_[0, n[:-1].cumsum()]) / n
```

Index a different range of indicies from each row of numpy array

3 Answers3