4

I have two arrays of 2D coordinate points (x,y)

a = [ (x1,y1), (x2,y2), ... (xN,yN) ]
b = [ (X1,Y1), (X2,Y2), ... (XN,YN) ]

How can I find the Euclidean distances between each aligned pairs (xi,yi) to (Xi,Yi) in an 1xN array?

The scipy.spatial.cdist function gives me distances between all pairs in an NxN array.

If I just use norm function to calculate the distance one by one it seems to be slow.

Is there a built in function to do this?

LWZ
  • 11,670
  • 22
  • 61
  • 79

3 Answers3

10

I'm not seeing a built-in, but you could do it yourself pretty easily.

distances = (a-b)**2
distances = distances.sum(axis=-1)
distances = np.sqrt(distances)
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 1
    It amounts to the same, but it is faster to do the squaring and adding with `np.dot`: `delta = a-b; dist = np.dot(delta, delta); dist = np.sqrt(dist)` – Jaime Jul 30 '13 at 01:28
  • I don't think `dot` vectorizes like that; it computes matrix products for 2-d inputs. You could probably do something with `einsum`, but I don't know the Einstein summation convention, so it's hard for me to give answers using it. – user2357112 Jul 30 '13 at 01:31
  • 2
    Oops! You are absolutely right, it's `inner1d` that does it: `import numpy.core.umath_tests as ut; delta = a-b; dist = np.sqrt(dnp.inner1d(delta, delta))`. Alternatively `dist = np.sqrt(np.einsum('ij, ij->i', delta, delta))`. – Jaime Jul 30 '13 at 01:43
2

hypot is another valid alternative

a, b = randn(10, 2), randn(10, 2)
ahat, bhat = (a - b).T
r = hypot(ahat, bhat)

Result of timeits between manual calculation and hypot:

Manual:

timeit sqrt(((a - b) ** 2).sum(-1))
100000 loops, best of 3: 10.3 µs per loop

Using hypot:

timeit hypot(ahat, bhat)
1000000 loops, best of 3: 1.3 µs per loop

Now how about some adult-sized arrays:

a, b = randn(1e7, 2), randn(1e7, 2)
ahat, bhat = (a - b).T

timeit -r10 -n3 hypot(ahat, bhat)
3 loops, best of 10: 208 ms per loop

timeit -r10 -n3 sqrt(((a - b) ** 2).sum(-1))
3 loops, best of 10: 224 ms per loop

Not much of a performance difference between the two methods. You can squeeze out a tiny bit more from the latter by avoiding pow:

d = a - b

timeit -r10 -n3 sqrt((d * d).sum(-1))
3 loops, best of 10: 184 ms per loop
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
0

try adding [:, np.newaxis, :] to the first parameter

np.linalg.norm(grid[:, np.newaxis, :] - scenario.target, axis=-1)

ref Numpy Broadcast to perform euclidean distance vectorized

vozman
  • 1,198
  • 1
  • 14
  • 19