3

I have to use K nearest neighbors for a set of vectors using the cosine similarity metric and some other user defined metrics. How can I achieve that using scikits learn? I found sklearn.neighbors.KNeighborsClassifier but I was not able to figure out any option of user defined metrics. I am currently using the latest version scikits learn 0.11.

ogrisel
  • 39,309
  • 12
  • 116
  • 125
lipid
  • 231
  • 1
  • 2
  • 6

1 Answers1

3

It is not (yet?) possible to pass precomputed or lazily computed user defined distance functions to the kNN models.

However in the master branch, now you have the possibility to use arbitrary p for p-Minkowsky distances:

https://github.com/scikit-learn/scikit-learn/pull/742

It would be quite easy to make it possible to pass arbitrary user defined distance function for the brute force method, however the ball tree implementation (for low dimensional data) cannot be adapted that easily to the general case.

Also for sparse positive data and the cosine similarity, an inverted index would be a better datastructure, see: http://metaoptimize.com/qa/questions/9691/efficient-nearest-neighbors-in-a-very-sparse-settings

ogrisel
  • 39,309
  • 12
  • 116
  • 125