0

I have what I believe is a fairly simple constraint satisfaction problem but can not find the proper package for implementing an algorithm.

I wish to subset a data set of some points. Each point comes with list of other data points it must excluded from the subset if it were to be included. For example:

  points Must_Exclude
1      A            B,E
2      B            
3      C            F,G,H
4      D            
5      E            D
6      F            
7      G            H
8      H            

I want to maximize the amount of points I can put in my subset without breaking any rules. My data contains 1000s of points. What are the names of algorithms set up for this type of problem? Are they any packages in R that I should look at? Should I look to other programing languages?

sascha
  • 32,238
  • 6
  • 68
  • 110
Henry Holm
  • 495
  • 3
  • 13
  • I would throw this first at MaxSAT or Integer Programming. The problem might be more complex that you thought (from a theoretical complexity view). The MaxSat formulation is trivial; the IP formulation is more or less what's done in the maximum independent set problem. (for further analysis of the underlying complexity, i would probably start by looking at MaxHornSat or something like that) – sascha Feb 01 '19 at 20:42
  • 1
    The problem is called maximum independent set. – David Eisenstat Feb 01 '19 at 21:12
  • @DavidEisenstat Hi David, Thanks for your comment. I think "maximum independent set" is close to what I am describing. Do you know any good packages for encoding and solving these problems? – Henry Holm Feb 01 '19 at 22:01
  • Hi @sascha thanks so much for your comment. I am not a data scientist by training but trying to implement some of this for data problems I have. Do you know any good packages or resources for coding and implementing either of the algorithms you mention? Either way you have definitely pointed me in the right direction here. – Henry Holm Feb 01 '19 at 22:07
  • This is more combinatorial-optimization than data-science (which is continuous-opt in most cases). The mentioned techniques are general purpose and there are many open source solvers available. Pick one supported in your environment, e.g. Cbc or GLPK (through any kind of wrapper-API; those are all implemented in low-lvl programming languages). The IP formulation of the maxIndSet is all over the web. You might want to check out some basics of propositional calculus, e.g. `a -> not b <-> not a OR not b` (or in 0-1 integer programming form: `a + b <= 1` and `a in {0,1}, b in {0,1}`). – sascha Feb 01 '19 at 22:24
  • What @sascha said, but make sure your solver does clique cuts. – David Eisenstat Feb 01 '19 at 23:55

1 Answers1

3

Solve as an Integer Linear Programming (ILP) problem with eight binary variables, each representing one of the data points from A to H, for example with the help of package lpSolve.

## Define inputs
obj <- rep(1, 8)                # 8 binary variables for A..H
A <- matrix(rbind(              # constraints:
    c(1,1,0,0,0,0,0,0),         # A <> B
    c(1,0,0,0,1,0,0,0),         # A <> E
    c(0,0,1,0,0,1,0,0),         # C <> F
    c(0,0,1,0,0,0,1,0),         # C <> G
    c(0,0,1,0,0,0,0,1),         # C <> H
    c(0,0,0,1,1,0,0,0),         # D <> E
    c(0,0,0,0,0,0,1,1)), 7, 8)  # G <> H
dir <- rep("<=", 7)             # all constraints '<='
rhs <- rep(1, 7)                # all right hand sides = 1
## maximise solution
sol <- lpSolve::lp("max", obj, A, dir, rhs,
                   all.bin = TRUE, num.bin.solns = 1)
sol$solution
## [1] 0 1 0 0 1 1 0 1

That is, one solution is (B, E, F, H); of course, there may be other combinations of the same size, for instance (A, D, F, G). You can get more solutions by setting option num.bin.solns to some value > 1.

Hans W.
  • 1,799
  • 9
  • 16
  • Amazing! Thank you so much. This is a super clear address of my problem and a great introduction to this package for me. Thank you! – Henry Holm Feb 04 '19 at 16:51
  • Since you had great knowledge of this problem here, do you know how to get multiple solutions using the pack lpSolveAPI? I asked this question using the same problem above here: https://stackoverflow.com/questions/55445759/how-to-get-lpsolveapi-to-return-all-possible-solutions – Henry Holm Mar 31 '19 at 23:08