3

This is a follow-up question to: Finding max value of a weighted subset sum of a power set Whereas the previous question solves (to optimality) problems of size <= 15 in reasonable time, I would like to solve problems of size ~2000 to near-optimality.

As a small example problem, I am given a certain range of nodes:

var range = [0,1,2,3,4];

A function creates a power set for all the nodes in the range and assigns each combination a numeric score. Negative scores are removed, resulting in the following array S. S[n][0] is the bitwise OR of all included nodes, and S[n][1] is the score:

var S = [
  [1,0], //0
  [2,0], //1
  [4,0], //2
  [8,0], //3
  [16,0], //4
  [3,50], //0-1 
  [5,100], //0-2
  [6,75], //1-2
  [20,130], //2-4
  [7,179] //0-1-2 e.g. combining 0-1-2 has a key of 7 (bitwise `1|2|4`) and a score of 179.
];

The optimal solution, maximizing the score, would be:

var solution = [[8,3,20],180]

Where solution[0] is an array of combinations from S. and solution[1] is the resulting score. Note that bitwise 8 & 3 & 20 == 0 signifying that each node is used only once.

Problem specifics: Each node must be used exactly once and the score for the single-node combinations will always be 0, as shown in the S array above.

My current solution (seen here) uses dynamic programming and works for small problems. I have seen heuristics involving dynamic programming, such as https://youtu.be/ze1Xa28h_Ns, but can't figure out how I'd apply that to a multi-dimensional problem. Given the problem constraints, what would be a reasonable heuristic to apply?

EDIT: Things I've tried

  • Greedy approach (sort score greatest to least, pick the next viable candidate)
  • Same as above, but sort by score/cardinality(combo)
  • GRASP (edit each score by up to 10%, then sort, repeat until a better solution hasn't been found in x seconds)
Community
  • 1
  • 1
Matt K
  • 4,813
  • 4
  • 22
  • 35
  • What is the "size" that you talk about being 15 or 2000? The maximum weight? The number of items? If the latter, and if you in fact generate the full powerset in memory, then you probably can't get more than around 30 even in a compiled language. Separately, why not remove all negative elements first, since they can never be in any optimal solution? (Unless I've misunderstood.) – j_random_hacker Sep 14 '15 at 12:28
  • Ah, good question. what I call "size" is the length of range. The function that generates the "power sets" is actually a heuristic as well & only gives me a subset of a power set that is typically less than 10x the size (eg problem of size 1000 would generate an `S` of 10,000). – Matt K Sep 14 '15 at 13:18

2 Answers2

1

A reasonable heuristic (the first that comes to my mind) would be one that iteratively took the feasible element with the largest score, eliminating all elements that have overlapping bits with the selected element.

I would implement this by first sorting in decreasing order by score and then then iteratively add the first element and filter the list, removing any element that overlaps the selected element.

In javascript:

function comp(a, b) {
  if (a[1] < b[1]) return 1;
  if (a[1] > b[1]) return -1;
  return 0;
}
S.sort(comp);  // Sort descending by score

var output = []
var score = 0;
while (S.length > 0) {
  output.push(S[0][0]);
  score += S[0][1];
  newS = [];
  for (var i=0; i < S.length; i++) {
    if ((S[i][0] & S[0][0]) == 0) {
      newS.push(S[i]);
    }
  }
  S = newS;
}

alert(JSON.stringify([output, score]));

This selects elements 7, 8, and 16, with score 179 (as opposed to the optimal score of 180).

josliber
  • 43,891
  • 12
  • 98
  • 133
  • I tried this greedy approach, first by sorting by `score` and then sorting by `score/cardinality(combo)` and both yielded results about 4x worse than my best known solution for a problem set of 250. I know the basic knapsack has a simple way to guarantee upper-bound worst case for the greedy heuristic, but is that possible with multiple dimensions? – Matt K Sep 14 '15 at 13:46
  • 1
    @MattK it would have been helpful to say in the question that you had already tried greedy by score and it wasn't sufficient for your problem (it would have saved me the time of writing this answer!). Please edit your question to include what you've tried and an example dataset where it performs poorly. – josliber Sep 14 '15 at 13:48
  • You're absolutely right, sorry about that! I was so fixed on a DP-like solution I forgot everything else I had tried until you mentioned it. Edits added – Matt K Sep 14 '15 at 14:49
1

This problem is really an integer optimization problem, with binary variables x_i indicating if the i^th element of S is selected and constraints indicating that each bit is used exactly once. The objective is to maximize the score attained across the selected elements. If we defined S_i to be the i^th element of S, L_b to be the indices of elements in S with bit b set, w_i to be the score associated with element i, and assumed there were n elements in set S and k bits, we could write this in mathematical notation as:

min_{x} \sum_{i=1..n} w_i*x_i
s.t.    \sum_{i \in L_b} x_i = 1  \forall b = 1..k
        x_i \in {0, 1}            \forall i = 1..n

In many cases, linear programming solvers are much (much, much) more effective than exhaustive search at solving these sorts of problems. Unfortunately I am not aware of any javascript linear programming libraries (a Google query turned up SimplexJS and glpk.js and node-lp_solve -- I have no experience with any of these and couldn't immediately get any to work). As a result I will do the implementation in R using the lpSolve package.

w <- c(0, 0, 0, 0, 0, 50, 100, 75, 130, 179)
elt <- c(1, 2, 4, 8, 16, 3, 5, 6, 20, 7)
k <- 5
library(lpSolve)
mod <- lp(direction = "max",
          objective.in = w,
          const.mat = t(sapply(1:k, function(b) 1*(bitwAnd(elt, 2^(b-1)) > 0))),
          const.dir = rep("=", k),
          const.rhs = rep(1, k),
          all.bin = TRUE)
elt[mod$solution > 0.999]
# [1]  8  3 20
mod$objval
# [1] 180

As you'll note, this is an exact formulation of your problem. However, by setting a timeout (you'd actually need to use the lpSolveAPI package in R instead of the lpSolve package to do this), you can get the best solution found by the solver before reaching your specified timeout. This may not be an optimal solution, and you can control how long before the heuristic stops trying to find better solutions. If the solver terminates before the timeout, the solution is guaranteed to be optimal.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Shoot, I think you're right, although I was hoping it wouldn't come to LP as that would mean spooling an R process to handle the LP (I haven't found a reliable JS solver either). I haven't tried problems large enough to require bitsets (my R is pretty rusty) but for small instances it does the trick. – Matt K Sep 14 '15 at 17:32
  • There are many other languages you could use instead of R, too (pretty much every major language has an LP solving library or three). It's just going to be so much faster than any DP solution that I think it's the way to go. Unfortunately JavaScript is optimally bad for running LPs because it's stuck in the browser. Can you optimize on the server in a non-javascript language and then send the result back? – josliber Sep 14 '15 at 17:45
  • Yeah, I think that's what I'll have to do (server is running node, so JS is preferable). That node package you listed is new to me though... looks like it uses a C++ binary with a JS wrapper, so I'll probably play around with that before doing it the "right" way – Matt K Sep 14 '15 at 18:25
  • @MattK sounds like the most natural solution. Good luck! – josliber Sep 14 '15 at 18:29