3

How can I calculate the element-wise euclidean distance between 2 numpy arrays? For example; I have 2 arrays both of dimensions 3x3 (known as array A and array B) and I want to calculate the euclidean distance between value A[0,0] and B[0,0]. Then I want to calculate the euclidean distance between value A[0,1] and B[0,1]. And so on. So the output array would be 3x3 aswell.

If I try to use scipy.spatial.distancecdist I get the error ValueError: XA must be a 2-dimensional array.

import numpy as np
from scipy.spatial.distance import cdist

a = np.array([
    [(0,255,0),(255,255,0),(0,255,0)],
    [(0,255,0),(255,255,0),(0,255,0)],
    [(0,255,0),(255,255,0),(0,255,0)],
])

b = np.array([
    [(255,255,0),(255,255,0),(0,255,0)],
    [(255,255,0),(255,255,0),(0,255,0)],
    [(255,255,0),(255,255,0),(0,255,0)],
])

dists = cdist(a, b, 'euclidean')
print(dists)
  • I would really like to use a scipy function because I can easily use a different distance measure with their functions. For example; cdist(a,b,'cityblock'), cdist(a,b,'sqeuclidean'), etc.

Edit My desired output is like so (the maths has been made up but the array dimensions are correct 3x3):

[[100, 0, 100]
[100, 0, 100]
[100, 0, 100]]

Ie, I am expecting:

[[cdist((0,255,0), (255,255,0)), cdist((0,255,0), (255,255,0)), cdist((0,255,0), (255,255,0)), 
 [...]
 [...]]
Divakar
  • 218,885
  • 19
  • 262
  • 358
sazr
  • 24,984
  • 66
  • 194
  • 362

3 Answers3

1

Few ways to do so are listed next.

Approach #1

Inspired by this post, we can solve it in a vectorized manner. So, following the wiki contents from eucl_dist package (disclaimer: I am its author), we could leverage matrix-multiplication and some NumPy specific implementations, like so -

def elementwise_cdist_v1(a,b):
    s_a = np.einsum('ijk,ijk->ij',a,a)
    s_b = np.einsum('ijk,ijk->ij',b,b)
    return np.sqrt(s_a+s_b-2*np.einsum('ijk,ijk->ij',a,b))

Approach #2

This is using np.einsum and implemented like so -

def elementwise_cdist_v2(a,b):
    d = a-b
    return np.sqrt(np.einsum('ijk,ijk->ij',d,d))

Timings on large arrays -

We are using random data with last axis of length=3, as that's the usual case when working with xyz coordinate data as it seems to be the case here.

In [72]: np.random.seed(0)
    ...: a = np.random.rand(1000,1000,3)
    ...: b = np.random.rand(1000,1000,3)

In [73]: %timeit elementwise_cdist_v1(a,b)
10 loops, best of 3: 23.9 ms per loop

In [74]: %timeit elementwise_cdist_v2(a,b)
100 loops, best of 3: 13.2 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

Reduce the dimensionality of your input arrays and it works.

import numpy as np
from scipy.spatial.distance import cdist

a = np.array([
    (0,255,0),
    (255,255,0),
    (0,255,0),
    (0,255,0),
    (255,255,0),
    (0,255,0),
    (0,255,0),
    (255,255,0),
    (0,255,0),
])

b = np.array([
    (255,255,0),
    (255,255,0),
    (0,255,0),
    (255,255,0),
    (255,255,0),
    (0,255,0),
    (255,255,0),
    (255,255,0),
    (0,255,0),
])

dist_matrix=cdist(a,b)
pair_dist=np.diag(dist_matrix,0)

dist3x3=np.reshape(pair_dist,(3,3))
print("pair_dist\n",pair_dist)
print("dist3x3\n",dist3x3)

Output:

dist_matrix
[[255. 255.   0. 255. 255.   0. 255. 255.   0.]
 [  0.   0. 255.   0.   0. 255.   0.   0. 255.]
 [255. 255.   0. 255. 255.   0. 255. 255.   0.]
 [255. 255.   0. 255. 255.   0. 255. 255.   0.] 
 [  0.   0. 255.   0.   0. 255.   0.   0. 255.] 
 [255. 255.   0. 255. 255.   0. 255. 255.   0.]
 [255. 255.   0. 255. 255.   0. 255. 255.   0.]
 [  0.   0. 255.   0.   0. 255.   0.   0. 255.]
 [255. 255.   0. 255. 255.   0. 255. 255.   0.]]

dist
 [255.   0.   0. 255.   0.   0. 255.   0.   0.]
dist3x3
 [[255.   0.   0.]
 [255.   0.   0.]
 [255.   0.   0.]]
visibleman
  • 3,175
  • 1
  • 14
  • 27
  • maybe my description is incorrect but I am expecting a different output from the comparison. Please see my (upcoming) edit. – sazr Nov 28 '19 at 01:01
  • You cannot control the requirements or output of cdist, so you have to reshape your input to match cdist api, and then you have extract and reshape the output the way you want it. See last edit – visibleman Nov 28 '19 at 01:16
0

First a simple NumPy solution for the Euclidean case:

>>> np.sqrt(np.sum((a-b)**2, axis=2))
array([[255.,   0.,   0.],
       [255.,   0.,   0.],
       [255.,   0.,   0.]])

You say that you want to use cdist instead. Note that using cdist for element-wise distance calculation is wasteful, as this function computes distances between all pairs of elements. However if performance is not an issue, try this:

>>> np.diag(cdist(a.reshape(-1, 3), b.reshape(-1, 3), 'euclidean')).reshape(-1, 3)
array([[255.,   0.,   0.],
       [255.,   0.,   0.],
       [255.,   0.,   0.]])

Edit: A solution whose memory requirements will scale more reasonably with the size of the arrays:

>>> np.array([
        cdist(x, y, 'euclidean')
        for (x, y) in zip(a.reshape(-1, 1, 3), b.reshape(-1, 1, 3))
    ]).reshape(-1, 3)
array([[255.,   0.,   0.],
       [255.,   0.,   0.],
       [255.,   0.,   0.]])
Seb
  • 4,422
  • 14
  • 23
  • thanks. Performance is quite important. Do you know of a different distance function in scipy that is more efficient? – sazr Nov 28 '19 at 01:27
  • @Seb using `pdist` instead of `cdist` in my code results in `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()` any advice whats the issue? `dists1 = pdist(a.reshape(-1, 3), b.reshape(-1, 3), 'euclidean')` – sazr Nov 28 '19 at 01:38
  • `pdist` is not going to work, as it computes distances between all pairs of a single given array. – Seb Nov 28 '19 at 01:38
  • @sazr I've added another solution that is more memory-efficient for large arrays. A direct vectorised NumPy implementation for each of the metrics would still be more efficient, though. It is actually kind of shocking that none of the distance functions in `scipy.spatial.distance` can be used for direct one-to-one distance calculation of two arrays of vectors. – Seb Nov 28 '19 at 01:50