0

I'm performing kNN prediction of numerical variable (knn regression or Locally weighted average). I am using euclidean distance and 1/distance as weights but I don't know how this is applied. and I have question:

How exactly the weightning in WEKA IBk for regression is performed? Is this a simple function like 1/distance or something more complicated? I looked in the source code but I couldn't understand anything. And how exactly the distance is defined - is it euclidean or some modification? What that code means (this is lines 867 and 868 from IBk source code):

distances[i] = distances[i]*distances[i];
distances[i] = Math.sqrt(distances[i]/m_NumAttributesUsed);
alchemist_bg
  • 148
  • 2
  • 8

1 Answers1

1

In kNN, weighting allows neighbors closer to the data point to have more influence on the predicted value.

For numerical regression in Weka with IBk, the weighting is performed as shown in the linked method.

I have summarized the steps in the following pseudo code.

Step 0: prediction = 0, total = 0

Step 1:

For each of the k-neighbors:

  1. Calculate distance to neighbor i

  2. Calculate weight: weight = 1 / (distance)

  3. Update prediction: prediction = prediction + neighbor i's class value * weight

  4. Update total: total = total + weight

Step 2: prediction = prediction / total

Step 3: return(prediction)

Walter
  • 2,811
  • 2
  • 21
  • 23
  • Yes, that should be! This is the standard algorithm. I am trying to reproduce the method in pure KNIME (no WEKA nodes) in order to add some features. But I can't get the same result as IBk!!! I calculated it by hand and again it doesn't works. I'm not sure but on lines 867 and 868 there is code that modifies distance somehow. Is that matther? Any ideas?! – alchemist_bg Sep 27 '13 at 17:39
  • I can't figure out why, but it does seem that they incorporate the number of used attributes into the distance measure. Have you tried reading the paper IBk is based on? It can be found [here](http://sci2s.ugr.es/pr/pdf/1991-Aha-ML.pdf). Just as a side note, your original question concerned the weighting. I'm happy to try and help, but if you are/were concerned about the specific lines of code you listed, you may receive more helpful responses if you mention them in the question! – Walter Sep 27 '13 at 20:02
  • I saw that paper but I couldn't find explanation. Well my initial concern was about the difference between results from classic knn and IBk. Then I started to search for the reason for this and I found the source code. I saw the two lines and I thought that this might be the reason for such difference. I tried calculation by hand but again I couldn't produce the IBk result. – alchemist_bg Sep 27 '13 at 20:38
  • I asked the same question in the WEKA forum and I was told by Mark Hall (one of WEKA core developers) that they use internal normalization of attributes to the 0-1 range. This could be the reason for the difference I noticed. Right now I don't have time to test this but I hope I could give more feedback in near feature. – alchemist_bg Oct 04 '13 at 05:25