2

I'm trying to implement Integer Programming for Nearest Neighbor Classifier in python using cvxpy.

Short intro

Given a dataset of n points with a color (red or blue) we would like to choose the minimal number of candidate points, s.t for each point that isn`t a candidate, its closest candidate has the same color.

My flow

Given a set of n points (with colors) define an indicator vector I (|I| = n),

I_i = 1 if and only if vertex i is chosen as a candidate

In addition, I defined two more vectors, named as A and B (|A| = |B| = n) as follow:

A_i = the distance between v_i to it's closest candidate with the **same** color
B_i = the distance between v_i to it's closest candidate with a **different** color

Therefore, I have n constrains which are: B_i > A_i for any i

My target is to minimize the sum of vector I (which represents the number of candidates)

My Issue

Its seems that the vectors A, B are changing because they affected by I, since when a candidate is chosen, it is affecting its entry in I which affects A and B and the constrains are dependent on those vectors..

Any suggestions?

Thanks !

Timor
  • 499
  • 4
  • 11

1 Answers1

3

To recap: you want to find the smallest set of examples belonging to a given training set such that the resulting nearest neighbor classifier achieves perfect accuracy on that training set.

I would suggest that you formulate this as follows. Create a 0–1 variable x(e) for each example e indicating whether e is chosen. For each ordered pair of examples e and e′ with different labels, write a constraint

x(e′) ≤ ∑e′′∈C(e,e′) x(e′′)

where C(e, e′) is the set of examples e′′ with the same label as e such that e′′ is closer to e than e′ is to e (including e′′ = e). This means that, if e′ is chosen, then it is not the nearest chosen example to e.

We also need

e x(e) ≥ 1

to disallow the empty set. Finally, the objective is

minimize ∑e x(e).

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • I think these constraints doesn't disallow some configurations that are not solution. Consider, that for a point x, the order of the other points with respect to proximity to x is: x1, x2,..., x(n-1). Consider that x1, x2 have same color than x, x3 has different color, x4 has same color and x5 has different color. – joaopfg May 13 '22 at 18:25
  • If x = x1 = x2 = x3 = x4 = 0 and x5 = 1, it will satisfy your constraint, since x3 <= x1 + x2 + x3. But it doesn't satisfy the condition of the problem: x and x5 have different colors, x = 0, x5 = 1 and x5 is the closest point to x that is chosen. – joaopfg May 13 '22 at 18:28
  • Am I missing something ? – joaopfg May 13 '22 at 18:30
  • @JohnDoe Yes, there is one constraint per (e, e') pair. The constraint for (1, 3) won't disallow it, but (1, 5) will. – David Eisenstat May 13 '22 at 20:12
  • Cool ! Got it now – joaopfg May 13 '22 at 20:54
  • Hi David, First of all thanks, Consider the following example: Red : (1,1) (4,1) (10,1) Blue: (5,1). By choosing, Red candidates: (1,1),(10,1) and Blue as (5,1) it will satisficed all our constrains, but still, for (4,1) it closest candidate its (5,1) which is blue. Hope that I clear enough – Timor May 14 '22 at 14:39
  • In short terms, I think those constraints doesn't consider this term: This is possible that e is closer to e' than e'' – Timor May 14 '22 at 14:44
  • @Timor no, the constraint where e = (4,1) and e' = (5,1) prevents this since it's x(5,1) <= x(4,1), and x(4,1) = 0. – David Eisenstat May 14 '22 at 14:44
  • @DavidEisenstat but the constraint of e = (4,1) and e' = (5,1) is x(5,1) <= x(4,1) + x(1,1) and x(1,1) = 1. For C(e,e') = C((4,1),(5,1)) = (4,1) (1,1) – Timor May 14 '22 at 14:46
  • @Timor no, C((4,1),(5,1)) = {(4,1)}, excluding (1,1) since it is not closer to (4,1) than (5,1). – David Eisenstat May 14 '22 at 15:01
  • 1
    @Timor I tightened up the wording on the definition of C. – David Eisenstat May 14 '22 at 15:03
  • @DavidEisenstat Perfecto! that was my mistake! Thanks David! – Timor May 14 '22 at 15:04