0

Why normalization is necessary in KNN ? I know that this process normalizes the effect of all the features on the results but the 'K' nearest points to a particular point V before normalization will be EXACTLY SAME as the 'K' nearest points to that particular point V after normalization. So what difference does normalization make regarding Euclidean Distance. After all KNN depends completely on Euclidean Distances ?  thanks in advance !

redb17
  • 143
  • 1
  • 14

2 Answers2

1

Most normalization techniques will change the 'K' nearest neighbours, if you have different variability in different dimensions.

Imagine dataset of A=(-5,0), B=(-5,1) and C=(5,1). Now consider a point of interest (4.5, 0). Clearly, C is the closest neighbour.

After min-max normalization to (-1,1) in both dimensions, your dataset becomes A=(-1, -1), B=(-1,1), C=(1,1). Your point of interest corresponds to (0.9, -1) in this new space. Thus, A is now the closest neighbour.

dedObed
  • 1,313
  • 1
  • 11
  • 19
0

I agree with dedObed. However, the answer seems to imply that scaling variables is not advisable in KNN. When variables with very different orders of magnitude are involved, the ones with the highest magnitudes will dominate the analysis. This may not be desirable in some cases. Scaling all variables would prevent this problem.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 03 '22 at 21:05