2

I have a matrix X that I am trying to use KNN with pearson correlation metric. Is it possible to use the pearson correlation as the sklearn metric? I have tried something like this:

def pearson_calc(M):
    P = (1 - np.array([[pearsonr(a,b)[0] for a in M] for b in M]))
    return P 

nbrs = NearestNeighbors(n_neighbors=4, metric=pearson_calc)
nbrs.fit(X)
knbrs = nbrs.kneighbors(X)

However, this does not work as I get the following error:

pearson_affinity() takes 1 positional argument but 2 were given

I am assuming that the pearson_calc function is wrong. Maybe it needs an a,b parameters and not a matrix.

Mike El Jackson
  • 771
  • 3
  • 14
  • 23

1 Answers1

3

Here is the docs on the matter :

If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them.

Additionally, valid values for metric are:

from scikit-learn:

[‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,‘manhattan’]

from scipy.spatial.distance:

[‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

Two things:

  • Your function needs to take two arguments (the two rows for which the metric (distance) is to be computed. That explains why the errors said that two arguments were being passed to it.

  • You can use scipy.spatial.distance.correlation as the metric like so:

      from scipy.spatial.distance import correlation
      nbrs = NearestNeighbors(n_neighbors=4, metric=correlation)
    

    ` source: sklearn NearestNeighbors

parsethis
  • 7,998
  • 3
  • 29
  • 31