Efficiently index 2d numpy array using two 1d arrays

Question

I have a large 2d numpy array and two 1d arrays that represent x/y indexes within the 2d array. I want to use these 1d arrays to perform an operation on the 2d array. I can do this with a for loop, but it's very slow when working on a large array. Is there a faster way? I tried using the 1d arrays simply as indexes but that didn't work. See this example:

import numpy as np

# Two example 2d arrays
cnt_a   =   np.zeros((4,4))
cnt_b   =   np.zeros((4,4))

# 1d arrays holding x and y indices
xpos    =   [0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3]
ypos    =   [3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0]

# This method works, but is very slow for a large array
for i in range(0,len(xpos)):
    cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1

# This method is fast, but gives incorrect answer
cnt_b[xpos,ypos] = cnt_b[xpos,ypos]+1


# Print the results
print 'Good:'
print cnt_a
print ''
print 'Bad:'
print cnt_b

The output from this is:

Good:
[[ 2.  1.  2.  1.]
 [ 0.  3.  1.  2.]
 [ 1.  1.  1.  1.]
 [ 1.  0.  0.  0.]]

Bad:
[[ 1.  1.  1.  1.]
 [ 0.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  0.  0.  0.]]

For the cnt_b array numpy is obviously not summing correctly, but I'm unsure how to fix this without resorting to the (v. inefficient) for loop used to calculate cnt_a.

you can about double the speed of the first for loop by changing the line in the loop to `cnt_a[xpos[i],ypos[i]] += 1`. — Zinki, Jun 28 '17 at 12:16
With `x` and `y` in a `2-column` array, here's one related [`Q&A`](https://stackoverflow.com/questions/36863404/accumulate-constant-value-in-numpy-array). — Divakar, Jun 28 '17 at 13:04

Imanol Luengo · Answer 1 · 2017-06-28T12:39:53.780

3

Another approach by using 1D indexing (suggested by @Shai) extended to answer the actual question:

>>> out = np.zeros((4, 4))
>>> idx = np.ravel_multi_index((xpos, ypos), out.shape) # extract 1D indexes
>>> x = np.bincount(idx, minlength=out.size)
>>> out.flat += x

np.bincount calculates how many times each of the index is present in the xpos, ypos and stores them in x.

Or, as suggested by @Divakar:

>>> out.flat += np.bincount(idx, minlength=out.size)

edited Jun 28 '17 at 12:39

answered Jun 28 '17 at 12:32

Imanol Luengo

15,366
2
49
67

Should be faster than `np.add.at`! – Divakar Jun 28 '17 at 12:32
@Divakar I'm not sure about that.. I guess depends on how many indexes are present and how big the output array is. If the number of indexes is large enough it might be quicker, but if the target array is huge and only few indexes are going to be modified `np.add.at` should be quicker. – Imanol Luengo Jun 28 '17 at 12:36
1

Or maybe do : `out.ravel() = np.binconunt...`, to save one step? Flattened view for help? – Divakar Jun 28 '17 at 12:37
1

Yes `out.ravel() = ... ` doesn't work for some weird reason (`can't assign to function call` error in python3), but `out.flat = ...` should work just fine. – Imanol Luengo Jun 28 '17 at 12:38

score 2 · Accepted Answer · answered Jun 28 '17 at 12:28

We could compute the linear indices, then accumulate into zeros-initialized output array with np.add.at. Thus, with xpos and ypos as arrays, here's one implementation -

m,n = xpos.max()+1, ypos.max()+1
out = np.zeros((m,n),dtype=int)
np.add.at(out.ravel(), xpos*n+ypos, 1)

Sample run -

In [95]: # 1d arrays holding x and y indices
    ...: xpos    =   np.array([0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3])
    ...: ypos    =   np.array([3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0])
    ...: 

In [96]: cnt_a   =   np.zeros((4,4))

In [97]: # This method works, but is very slow for a large array
    ...: for i in range(0,len(xpos)):
    ...:     cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
    ...:     

In [98]: m,n = xpos.max()+1, ypos.max()+1
    ...: out = np.zeros((m,n),dtype=int)
    ...: np.add.at(out.ravel(), xpos*n+ypos, 1)
    ...: 

In [99]: cnt_a
Out[99]: 
array([[ 2.,  1.,  2.,  1.],
       [ 0.,  3.,  1.,  2.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  0.,  0.,  0.]])

In [100]: out
Out[100]: 
array([[2, 1, 2, 1],
       [0, 3, 1, 2],
       [1, 1, 1, 1],
       [1, 0, 0, 0]])

Thanks, this is a bit faster than my original solution. It also has the advantage of working with any 'addition' to the original array, rather than simply adding 1 (through the last variable passed to np.add.at). — os1, Jun 28 '17 at 13:38

smagnan · Answer 3 · 2017-06-28T12:29:21.493

you can iterate on both lists at once, and increment for each couple (if you are not used to it, zip can combine lists)

for x, y in zip(xpos, ypos):
    cnt_b[x][y] += 1

But this will be about the same speed as your solution A. If your lists xpos/ypos are of length n, I don't see how you can update your matrix in less than o(n) since you'll have to check each pair one way or an other.

Other solution: you could count (with collections.Counter possibly) the similar index pairs (ex: (0, 3) etc...) and update the matrix with the count value. But I doubt it would be much faster, since you the time gained on updating the matrix would be lost on counting multiple occurrences.

Maybe I am totally wrong tho, in which case I'd be curious too to see a not o(n) answer

Shai · Answer 4 · 2017-06-28T12:32:31.180

0

I think you are looking for ravel_multi_index funciton

lidx = np.ravel_multi_index((xpos, ypos), cnt_a.shape)

converts to "flatten" 1D indices into cnt_a and cnt_b:

np.add.at( cnt_b, lidx, 1 )

edited Jun 28 '17 at 12:32

answered Jun 28 '17 at 12:21

Shai

111,146
38
238
371

2

I think the problem is with indexing the same coordinates twice, not with the indexes itself (e.g. the index `[0,0]` is twice in the list). – Imanol Luengo Jun 28 '17 at 12:22

Efficiently index 2d numpy array using two 1d arrays

4 Answers4

Linked