I want to optimize KNN. There is a lot about SVM, RF and XGboost; but very few for KNN.
As far as I know the number of neighbors is one parameter to tune.
But what other parameters to test? Is there any good article?
Thank you
I want to optimize KNN. There is a lot about SVM, RF and XGboost; but very few for KNN.
As far as I know the number of neighbors is one parameter to tune.
But what other parameters to test? Is there any good article?
Thank you
KNN is so simple method that there is pretty much nothing to tune besides K. The whole method is literally:
for a given test sample x:
- find K most similar samples from training set, according to similarity measure s
- return the majority vote of the class from the above set
Consequently the only thing used to define KNN besides K is the similarity measure s, and that's all. There is literally nothing else in this algorithm (as it has 3 lines of pseudocode). On the other hand finding "the best similarity measure" is equivalently hard problem as learning a classifier itself, thus there is no real method of doing so, and people usually end up using either simple things (Euclidean distance) or use their domain knowledge to adapt s to the problem at hand.
Lejlot, pretty much summed it all. K-NN is so simple that it's an instance based nonparametric algorithm, that's what makes it so beautiful, and works really well for certain specific examples. Most of K-NN research is not in K-NN itself but in the computation and hardware that goes into it. If you'd like some readings on K-NN and machine learning algorithms Charles Bishop - Pattern Recognition and Machine Learning. Warning: it is heavy in the mathematics, but, Machine Learning and real computer science is all math.
By optimizing if you are also focusing on the reduction of prediction time (you should) then there are other aspects which you can implement to make the algorithm more efficient (But these are not parameter tuning). The major draw back with the KNN is that with the increasing number of training examples, the prediction time also goes high thus performance go low.
To optimize, you can check on the KNN with KD-trees, KNN with inverted lists(index) and KNN with locality sensitive hashing (KNN with LSH). These will reduce the search space during the prediction time thus optimizing the algorithm.