3

I have two arrays say:

A = np.array([[ 1.  ,  1.  ,  0.5 ],
              [ 2.  ,  2.  ,  0.7 ],
              [ 3.  ,  4.  ,  1.2 ],
              [ 4.  ,  3.  ,  2.33],
              [ 1.  ,  2.  ,  0.5 ],
              [ 6.  ,  5.  ,  0.3 ],
              [ 4.  ,  5.  ,  1.2 ],
              [ 5.  ,  5.  ,  1.5 ]])

B = np.array([2,1])

I would want to find all values of A which are not within a radius of 2 from B.

My answer should be:

C = [[3,4,1.2],[4,3,2.33],[6,5,0.3],[4,5,1.2],[5,5,1.5]]

Is there a pythonic way to do this?

What I have tried is:

radius = 2
C.append(np.extract((cdist(A[:, :2], B[np.newaxis]) > radius), A))

But I realized that np.extract flattens A and i dont get what i is expected.

Divakar
  • 218,885
  • 19
  • 262
  • 358

2 Answers2

6

Let R be the radius here. We would have few methods to solve it, as discussed next.

Approach #1 : Using cdist -

from scipy.spatial.distance import cdist

A[(cdist(A[:,:2],B[None]) > R).ravel()]

Approach #2 : Using np.einsum -

d = A[:,:2] - B
out = A[np.einsum('ij,ij->i', d,d) > R**2]

Approach #3 : Using np.linalg.norm -

A[np.linalg.norm(A[:,:2] - B, axis=1) > R]

Approach #4 : Using matrix-multiplication with np.dot -

A[(A[:,:2]**2).sum(1) + (B**2).sum() - 2*A[:,:2].dot(B) > R**2]

Approach #5 : Using a combination of einsum and matrix-multiplication -

A[np.einsum('ij,ij->i',A[:,:2],A[:,:2]) + B.dot(B) - 2*A[:,:2].dot(B) > R**2]

Approach #6 : Using broadcasting -

A[((A[:,:2] - B)**2).sum(1) > R**2]

Hence, to get the points within radius R simply replace > with < in the above mentioned solutions.

Divakar
  • 218,885
  • 19
  • 262
  • 358
3

Another useful approach not mentioned by @Divakar is to use a cKDTree:

from scipy.spatial import cKDTree

# Find indices of points within radius
radius = 2
indices = cKDTree(A[:, :2]).query_ball_point(B, radius)

# Construct a mask over these points
mask = np.zeros(len(A), dtype=bool)
mask[indices] = True

# Extract values not among the nearest neighbors
A[~mask]

The primary benefit is that it will be much faster than any direct approach as the size of the array increases, because the data structure avoids computing a distance for every point in A.

jakevdp
  • 77,104
  • 11
  • 125
  • 160