0

I am using scikit-learn's KNN regressor to fit a model to a large dataset with n_neighbors = 100-500. Given the nature of the data, some parts (think: sharp delta-function like peaks) are better fit with fewer neighbors (n_neighbors ~ 20-50) so that the peaks are not smoothed out. The location of these peaks are known (or can be measured).

Is there a way to vary the n_neighbors parameter?

I could fit two models and stitch them together, but that would be inefficient. It would be preferable to either prescribe 2-3 values for n_neighbors or, worse, send in an list of n_neighbors.

saud
  • 773
  • 1
  • 7
  • 20

1 Answers1

1

I'm afraid not. In part, this is due to some algebraic assumptions that the relationship is symmetric: A is a neighbour to B iff B is a neighbour to A. If you give different k values, you're guaranteed to break that symmetry.

I think the major reason is simply that the algorithm is simpler with a fixed quantity of neighbors, yielding better results in general. You have a specific case that KNN doesn't fit so well.

I suggest that you stitch together your two models, switching dependent on the imputed second derivative.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Thanks, I was afraid that was the case. I did not know about the symmetry postulate but it makes sense. – saud Nov 10 '16 at 22:24