How to get m pair of points among n points that have the largest distance between them

Question

Say I have the following points defined in a one dimensional space:

x = np.array([[0.70710678],
             [0.70710678],
             [0.        ],
             [1.41421356]])

I want to get m pair of points among these n points that have the longest euclidean distance between them (if m is 1 in this case will be 1.4142 and 0 )

I tried getting the pairwise distance with :

from scipy.spatial.distance import pdist, cdist

cdist(x,x, 'seuclidean')

from this part I'm not sure how to do the rest however.

Divakar · Accepted Answer · 2019-11-04T12:52:33.567

We could make use of np.argpartition on flattened distances off cdist result -

dists = np.triu(cdist(x,x, 'seuclidean'),1)
s = dists.shape
idx = np.vstack(np.unravel_index(np.argpartition(dists.ravel(),-m)[-m:],s)).T

idx would be m pairs of indexes that are farthest, i.e. each row of idx would represent indexes of one pair from x.

Sample run -

# with m = 1
In [144]: idx
Out[144]: array([[2, 3]])

# with m = 2    
In [147]: idx
Out[147]: 
array([[1, 2],
       [2, 3]])

# with m = 3        
In [150]: idx
Out[150]: 
array([[0, 3],
       [1, 2],
       [2, 3]])

Sample run on 2D array -

In [44]: x
Out[44]: 
array([[1.25, 1.25],
       [1.25, 1.25],
       [1.87, 1.87],
       [0.62, 0.62],
       [0.62, 0.62],
       [1.25, 1.25],
       [0.  , 0.  ],
       [0.62, 0.62]])

In [45]: m = 2

In [46]: dists
Out[46]: 
array([[0.  , 0.  , 1.58, 1.58, 1.58, 0.  , 3.16, 1.58],
       [0.  , 0.  , 1.58, 1.58, 1.58, 0.  , 3.16, 1.58],
       [0.  , 0.  , 0.  , 3.16, 3.16, 1.58, 4.74, 3.16],
       [0.  , 0.  , 0.  , 0.  , 0.  , 1.58, 1.58, 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 1.58, 1.58, 0.  ],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 3.16, 1.58],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.58],
       [0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])

In [47]: idx
Out[47]: 
array([[0, 6],
       [2, 6]])

Note that because of the way argpartition works, idx might not have the indices in their sorted order of distances. To force it that way, we could do -

idx[dists[tuple(idx.T)].argsort()]

if for example my points are 2 dimensional: array([[1.246, 1.246], [1.246, 1.246], [1.869, 1.869], [0.623, 0.623], [0.623, 0.623], [1.246, 1.246], [0. , 0. ], [0.623, 0.623]]) I don't exactly get the indeces of the 2 furthest pairs — Alejandro, Nov 04 '19 at 12:42
@Azerila Seems to be working fine. Added `Sample run on 2D array ` section. Please check it out. — Divakar, Nov 04 '19 at 12:52

score 1 · Answer 2 · answered Nov 04 '19 at 11:20

1

To pair each point with it's furthest counterpart you can use:

np.dstack((x, x[cdist(x,x, 'seuclidean').argmax(axis=-1)]))

#array([[[0.70710678, 0.        ]],
#
#       [[0.70710678, 0.        ]],
#
#       [[0.        , 1.41421356]],
#
#       [[1.41421356, 0.        ]]])

answered Nov 04 '19 at 11:20

zipa

27,316
6
40
58

In the output the same pairs are repeated with a different order, I only look for the m longest pairs – Alejandro Nov 04 '19 at 11:56

How to get m pair of points among n points that have the largest distance between them

2 Answers2