0

From the slides of a course, I found these:

Given a set P in R^D, and a query point q, it's NN is point p_0 in P, where:

dist(p_0, q) <= dist(p, q), for every p in P.

Similarly, with an approximation factor 1 > ε > 0, the ε-NN is p_0, such that:

dist(p_0, q) <= (1+ε) * dist(p, q), for every p in P.

(I wonder why ε can't reach 1).

We build a KD-tree and then we search for the NN, with this algorithm: enter image description here which is correct, as far as my mind goes and my testing.

How should I modify the above algorithm, in order to perform Approximate Nearest Neighbour Search (ANNS)?

My thought is to multiply the current best (at the part of the update in the leaf) with ε and leave the rest of the algorithm as is. I am not sure however, if this is correct. Can someone explain?

PS - I understand how search for NN works.

Note that I asked in the Computer Science site, but I got nothing!

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • (I wonder why ε can't reach 1) There's probably no fundamental reason. My guess is that the authors wanted to assume 1/ε > 1 and were concerned only with the asymptotic behavior as ε goes to zero. – David Eisenstat Sep 11 '14 at 14:53
  • Please don't cross-post on multiple StackExchange sites; that violates site rules. – D.W. Sep 15 '14 at 04:17

1 Answers1

2

The one modification needed is to replace current best distance with current best distance/(1+ε). This prunes the nodes that cannot contain a point violating the new inequality.

The reason that this works is that (assuming that cut-coor(q) is on the left side) the test

cut-coor(q) + current best distance > node's cut-value

is checking to see if the hyperplane separating left-child and right-child is closer than current best distance, which is a necessary condition for a point in right-child to be closer than that to the query point q, as the line segment joining q and a point in right-child passes through that hyperplane. By replacing d(p_0, q) = current best distance with current best distance/(1+ε), we're checking to see if any point p on the right side could satisfy

d(p, q) < d(p_0, q)/(1+ε),

which is equivalent to

(1+ε) d(p, q) < d(p_0, q),

which is a witness to the violation of the approximate nearest neighbor guarantee.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Thanks for the comment and the answer. So my feeling for just multiplying was wrong. I am going to draw on paper to see how this is true, but if you can help me getting the feeling, you are welcome. You see I want to understand, not just make the code run. :) – gsamaras Sep 11 '14 at 14:59
  • @G.Samaras I added some explanation. – David Eisenstat Sep 11 '14 at 15:06
  • In your approach, the greater the ε the more accurate the algorithm is. Right? – gsamaras Sep 11 '14 at 15:08
  • @G.Samaras Greater values of ε mean lesser accuracy, since the smaller that `current best distance/(1+ε)` is, the less likely we are to explore the other child. – David Eisenstat Sep 11 '14 at 15:09
  • That's what dangled me with your explanation. Yes you are right! – gsamaras Sep 11 '14 at 15:11
  • @G.Samaras Sorry, botched the algebra there. – David Eisenstat Sep 11 '14 at 15:13
  • Yeah I saw that. So David, my original feeling wasn't that bad after all, or am I mistaken? I mean, in the part we search the points of a leaf, we keep the best distance found and we multiply it by (1+ε) and the rest of the algorithm remains intact. Do you agree? – gsamaras Sep 11 '14 at 15:21
  • @G.Samaras Nope, still a divide by 1+ε > 1. The idea is that we lie to ourselves by pretending that the current best is better than it actually is, in the hope that we avoid some traversals. – David Eisenstat Sep 11 '14 at 15:25
  • I see, however division is costly. On the other hand doing the multiplication every time you are in an inner node is also "bad". Thanks anyway! – gsamaras Sep 11 '14 at 15:29
  • 1
    @G.Samaras Compute 1/(1+ε) once and then multiply by that instead. – David Eisenstat Sep 11 '14 at 15:34