scipy cdist or pdist on arrays of complex numbers

Question

The computation of a Euclidean distance between two complex numbers with scipy.spatial.distance.euclidean works:

import numpy
import scipy.spatial.distance
z1 = numpy.complex(numpy.cos(0), numpy.sin(0))
z2 = numpy.complex(numpy.cos(3*numpy.pi/2), numpy.sin(3*numpy.pi/2))
print scipy.spatial.distance.euclidean(z1, z2)

gives:

1.4142135623730951

However the pairwise distance matrix or the distance between each pair of the two input arrays doesn't work:

A = numpy.random.uniform(size=(5,1)) + numpy.random.uniform(size=(5,1))*1j
print scipy.spatial.distance.pdist(A)

returns a warning and the distances between the real parts:

lib/python2.7/site-packages/scipy/spatial/distance.py:107: ComplexWarning: Casting complex values to real discards the imaginary part
X = X.astype(np.double)
array([ 0.78016544,  0.66201108,  0.8330932 ,  0.54355982,  0.11815436,
        0.05292776,  0.23660562,  0.17108212,  0.11845125,  0.28953338])

It's the same with scipy.spatial.distance.cdist(A,A).

Is it possible to compute the pairwise distance matrix or the distance between each pair of the two input arrays using cdist or pdist, without using a for loop and scipy.spatial.distance.euclidean which is too slow for my problem?

there's something wrong. In the first example with `scipy.spatial.distance.euclidean`, you calculate the distance between two complex points. On the other hand, in the `pdist` example, the points have each 5 dimensions, with a complex number in each dimension. (the n. of dimensions is the length of the 2nd dimension of the input, see the docs for `pdist`) — gg349, Feb 21 '14 at 15:56
Yes, you are true, I change the dimension of A from (5,5) to (5,1) for clarity, but the warning is the same — bougui, Feb 21 '14 at 16:20

gg349 · Accepted Answer · 2014-02-21T18:03:12.773

The euclidean norm of a complex number is defined as the modulus of the number, and then you can define the distance between two complex numbers as the modulus of their difference.

The warning is there because pdist and cdist are designed for N-dimensional (scalar) spaces, where such notion of distance does not make any sense. (How do you deal with many dimensions, each of them containing a complex number? for scalars is pretty easy, but for complex you have a few options)

Given two collection of points:

A = numpy.random.uniform(size=(5)) + numpy.random.uniform(size=(5))*1j
B = numpy.random.uniform(size=(5)) + numpy.random.uniform(size=(5))*1j

The distance between each point of A and each point of B can be calculated as

MA = tile(A[:,newaxis],A.size)
MB = tile(B[:,newaxis],B.size)
dist = abs(MA-MB.T)

and you'll have for example in dist[2][3] the distance between the third point of the collection A and fourth point of the collection B.

This is very efficient, even more so if done in one step as @ali_m suggests in the comments,

dist = np.abs(A[:, None] - B[None, :])

If you just want the pairwise distance matrix of a single collection A, you can substitute B with A in the code above. The matrix dist will be symmetric and will be zero on the diagonal. So you'd be making about twice the number of operations you'd do in a loop, and you'd occupy about twice the memory required. Likely it would still be faster than a solution with a loop (also because with the loop you'd loop over pairs of numbers)

Note that tiling the input arrays is not necessary - you can reduce your overhead and memory footprint by using broadcasting instead, e.g. `dist = np.abs(A[:, None] - B[None, :])` — ali_m, Feb 21 '14 at 17:54

scipy cdist or pdist on arrays of complex numbers

1 Answers1

Linked