I have a numpy array points
of shape [N,2] which contains the (x,y) coordinates of N points. I'd like to compute the mean distance of every point to all other points using an existing function (which we'll call cmp_dist
and which I just use as a black box).
First a verbose solution in "normal" python to illustrate what I want to do (written from the top of my head):
mean_dist = []
for i,(x0,y0) in enumerate(points):
dist = [
for j,(x1,y1) in enumerate(points):
if i==j: continue
dist.append(comp_dist(x0,y0,x1,y1))
mean_dist.append(np.array(dist).mean())
I already found a "better" solution using list comprehensions (assuming list comprehensions are usually better) which seems to work just fine:
mean_dist = [np.array([cmp_dist(x0,y0,x1,y1) for j,(x1,y1) in enumerate(points) if not i==j]).mean()
for i,(x0,y0) in enumerate(points)]
However, I'm sure there's a much better solution for this in pure numpy, hopefully some function that allows to do an operation for every element using all other elements.
How can I write this code in pure numpy/scipy?
I tried to find something myself, but this is quite hard to google without knowing how such operations are called (my respective math classes are quite a while back).
Edit: Not a duplicate of Fastest pairwise distance metric in python
The author of that question has a 1D array r
and is satisfied with what scipy.spatial.distance.pdist(r, 'cityblock')
returns (an array containing the distances between all points). However, pdist
returns a flat array, that is, is is not clear which of the distances belong to which point (see my answer).
(Although, as explained in that answer, pdist
is what I was ultimately looking for, it doesnt solve the problem as I've specified it in the question.)