0

I built a classifier model using KNN as learners for an ensemble based on the random subspace method.

I have three predictors, whose dimension is 541 samples, and I develop an optimization procedure to find the best k (number of neighbours). I chose the k that maximize the AUC of the classifier, whose performance is computed with 10-fold-cross-validation. The result for the best k was 269 for each single weak learners (that are 60 as a result of a similar optimization).

Now, my question is: Are 269 neighbours too many? I trust the results of the optimization, but I have never used so many neighbours and I am worried about overfitting.

Thank you in advance, MP

rayryeng
  • 102,964
  • 22
  • 184
  • 193
marta
  • 5
  • 3

1 Answers1

1

The choice of k-value in k-NN is rather data dependent. We can argue about more general characteristics of smaller or bigger choices of k-values, but specifying a certain number as good/bad is not very accurate to tell. Because of this, if your CV implementation is correct, you can trust the results and move further with it because the CV will give the optimal for your specific case. For more of a general discussion, we can say these about the choice of the k-value:

1- Smaller choice of k-value : Small choice of k-values might increase the overall accuracy and are less costly to implement, but will make the system less robust to noisy input.

2- Bigger choice of k-value : Bigger choice of k-values will make the system more robust against noisy input, but will be more costly to execute and have weaker decision boundaries compared to smaller k-values.

You can always compare these general characteristics while choosing the k-value in your application. However, for choosing the optimal values using an algorithm like CV will give you a definite answer.

Koralp Catalsakal
  • 1,114
  • 8
  • 11
  • Great explanation! Thank you so much! May I ask you some references about this relationship between number of neighbors and strength of decision boundaries? – marta Feb 13 '19 at 22:06
  • You can have a look at this paper : https://scialert.net/abstract/?doi=jas.2014.171.176 It is concisely explained there.The authors also give different viewpoints about the impacts of k-NN parameters over accuracy – Koralp Catalsakal Feb 14 '19 at 13:38
  • Thank you again for your help! Best wishes – marta Feb 14 '19 at 14:20