Why metric='precomputed' doesn't work in sk-learn's k-nearest neighbours?

Question

I'm trying to fit a precomputed kernel matrix when using http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html, it is apparently possible since the metric 'precomputed' exists. I allows you to pass a n_samples*n_samples kernel matrix to fit method.

When using it, here's what I get :

ValueError: Metric 'precomputed' not valid for algorithm 'auto'

I don't understand how using algorithm 'auto' to find nearest neighbours is not compatible with the fact that I'm using a precomputed kernel matrix.

EDIT :

Unfortunately my question didn't get any attention. I've looked into the source code more deeply and it seems that there is a bug since when you pass metric=precomputed, since the code should allow you to choose algorithm=auto. Instead of that, when running, the code bumps into the valueError I mentioned, and I don't think the author wanted his code to behave that way. I have no idea how to change the source code to behave properly.

Also I want to add to the question that on a more theoritical point of view, it is completely justified to be able to use a kernel matrix (aka gram matrix) to use fit method of kNN. You can derive the distance matrix from the gram matrix and then when you want to predict a new data you just have to find the k nearest neighbors and label the new data with the most present label in the k nearest neighbors.

I really think this question should get an answer. It is properly asked, I want something really precise and I know that someone with a deeper understanding of Python and scikit learn library should be able to answer it. Maybe I'm missing something obvious but I also think it should help anyone trying to use kNN with a precomputed kernel matrix (which is not an isolated case).

score 1 · Accepted Answer · answered Jun 30 '17 at 13:38

I guess this is way too late a reply but if you were still wondering. 'Auto' won't work because KDTree doesn't accept a user-defined or precomputed metric. Only Ball Tree will work. If you specifically set algorithm to 'Ball Tree' it should work just fine. Hope this helps!

Why metric='precomputed' doesn't work in sk-learn's k-nearest neighbours?

1 Answers1