12

I have been interested in usage of scipy.spatial.distance.pdist(...) in python which has come to be useful and fast for some of the applications I have been working on.

I need to use a pairwise distance function which are custom and not standard default distance metrics as defined by the metric. Let's make a simple example, suppose I do not want to use euclidean distance function as the following:

 Y = pdist(X, 'euclidean')

Instead I want to define the euclidean function myself and pass it as a function or argument to pdist(). How can I pass the implementation of euclidean distance function to this function to get exactly the same results. The answer to this question, will help me to use the function in the way I am interested in.

In MATLAB, I know how to use pdist(), in Python I don't yet. Thanks for your suggestion

AlexV
  • 578
  • 8
  • 19
Yas
  • 811
  • 4
  • 11
  • 20

1 Answers1

20

There is an example in the documentation for pdist:

import numpy as np
from scipy.spatial.distance import pdist

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

If you want to use a regular function instead of a lambda function the equivalent would be

import numpy as np
from scipy.spatial.distance import pdist

def dfun(u, v):
    return np.sqrt(((u-v)**2).sum())

dm = pdist(X, dfun)
Pelle Nilsson
  • 970
  • 8
  • 10
  • how to use the new distance metric in seaborn's hierarchical clustering function ```sns.clustermap(X, metric=dm)```? Trying just this produced an error when $X$ is a square matrix – develarist Aug 17 '20 at 15:18