I'm using a linear program to test if two sets of points are separable by a line. Theoretically this works, but in practice there seem to be numerical issues if the points are close to being colinear or if they are near to each other.
The package I'm using is scipy linprog
.
I wrote some code to give an example of what I'm talking about. This code produces a cloud 'locations' of N points, and picks some random lines, then uses a margin around those lines to divide a subset of the 'locations' set into two regions, X and Y, and then checks if the linear program successfully finds a separating line between X and Y, or if the linear program concludes that there are no such separating line (fails to find a feasible point).
As you can see from the output (below), as the margin goes from 1 to 2^(-10), the probability that the linear program correctly detects that the two regions are linearly separable decays. This suggests that maybe there are some issues when detecting that very nearby point clouds can be separated.
Note that because I'm feeding the linear programs sets which are guaranteed to be linearly seperable, the outputs should all be 1.
import numpy as np
from scipy.optimize import linprog
N = 100
for minv in range(10):
margin = 1/(2**minv)
locations = np.random.uniform(-1,1,[N,2])
tests = []
for i in range(50):
p = np.random.normal(0,1,[3])
X = []
Y = []
for a in locations:
if np.dot(a, [p[0],p[1]]) < p[2] - margin:
X.append(a)
if np.dot(a, [p[0],p[1]]) > p[2] + margin:
Y.append(a)
#X and Y are two linearly seperable subsets of 'locations'
A = []
b = []
if X != [] and Y != []:
#This if is just to avoid an edge case that causes problems with linprog
for s in X:
A.append( [-1*v for v in s] + [1] )
b.append(-1)
for s in Y:
A.append( list(s) + [-1])
b.append(-1)
#See: https://www.joyofdata.de/blog/testing-linear-separability-linear-programming-r-glpk/
res = linprog(c = [0,0,0], A_ub = A, b_ub = b, bounds = [-np.inf, np.inf])
tests.append(res["success"])
print(np.mean(tests))
Output:
1.0
1.0
0.909090909091
0.8
0.805555555556
0.5
0.375
0.444444444444
0.378378378378
0.410256410256
My question:
How can I reliably (and efficiently) solve the problem of detecting whether two sets of points are linearly separable? (Another approach would be to build convex hulls, and I've written this out mostly. there are just some issues with using Qhull: Qhull Convex hull wants me to input at least 3 points )
Is this a bug in scipy linprog? Or am I just not using it correctly?