I have to use K nearest neighbors for a set of vectors using the cosine similarity metric and some other user defined metrics. How can I achieve that using scikits learn? I found sklearn.neighbors.KNeighborsClassifier
but I was not able to figure out any option of user defined metrics. I am currently using the latest version scikits learn 0.11.
Asked
Active
Viewed 2,419 times
3
1 Answers
3
It is not (yet?) possible to pass precomputed or lazily computed user defined distance functions to the kNN models.
However in the master branch, now you have the possibility to use arbitrary p for p-Minkowsky distances:
https://github.com/scikit-learn/scikit-learn/pull/742
It would be quite easy to make it possible to pass arbitrary user defined distance function for the brute force method, however the ball tree implementation (for low dimensional data) cannot be adapted that easily to the general case.
Also for sparse positive data and the cosine similarity, an inverted index would be a better datastructure, see: http://metaoptimize.com/qa/questions/9691/efficient-nearest-neighbors-in-a-very-sparse-settings

ogrisel
- 39,309
- 12
- 116
- 125