Algorithm to find some rows from a matrix, whose sum is equal to a given row

Question

For example, here is a matrix:

[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],

I want to find some rows, whose sum is equal to [4, 3, 2, 1]. The expected answer is rows: {0,1,3,4}. Because:

[1, 0, 0, 0] + [1, 1, 0, 0] + [1, 1, 1, 0] + [1, 1, 1, 1] = [4, 3, 2, 1]

Is there some famous or related algrithoms to resolve the problem?

Thank @sascha and @N. Wouda for the comments. To clarify it, here I provide some more details.

In my problem, the matrix will have about 50 rows and 25 columns. But echo row will just have less than 4 elements (other is zero). And every solution has 8 rows.

If I try all combinations, c(8, 50) is about 0.55 billion times of attempt. Too complex. So I want to find a more effective algrithom.

You tagged integer-programming already. [Just use i](https://stackoverflow.com/a/62488936/2320035)t. Will be hard to beat. Everything else is probably depending on more *rules* behind this task not really explained by this single example (from trivial systems of linear eqs to np-hard exact-covering like problems: everything is possible). — sascha, Jun 28 '20 at 12:12
This should not be too hard as a constraint programming problem, with a binary variable `x_i` for each row. The constraints are then that the selected rows should sum element-wise to your given right-hand side. You could have a look at [OR-Tools CP solver](https://developers.google.com/optimization/cp/cp_solver). — Nelewout, Jun 28 '20 at 12:18

score 2 · Accepted Answer · answered Jun 28 '20 at 23:38

If you want to make the jump to using a solver, I'd recommend it. This is a pretty straightforward Integer Program. Below solutions use python, python's pyomo math programming package to formulate the problem, and COIN OR's cbc solver for Integer Programs and Mixed Integer Programs, which needs to be installed separately (freeware) available: https://www.coin-or.org/downloading/

Here is the an example with your data followed by an example with 100,000 rows. The example above solves instantly, the 100,000 row example takes about 2 seconds on my machine.

# row selection Integer Program
import pyomo.environ as pyo

data1 = [   [1, 0, 0, 0],
            [1, 1, 0, 0],
            [1, 0, 1, 0],
            [1, 1, 1, 0],
            [1, 1, 1, 1],]


data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}

model = pyo.ConcreteModel()

# sets
model.I = pyo.Set(initialize=range(len(data1)))     # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0])))  # a simple column index

# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict)   # hold the sparse matrix of values
magic_sum = [4, 3, 2, 1 ]

# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable

# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
    return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j] 
model.c1 = pyo.Constraint(model.J, rule=min_sum)

# objective function
# minimze the overage
def objective(model):
    delta = 0
    for j in model.J:
        delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j] 

    return delta

model.OBJ = pyo.Objective(rule=objective)


model.pprint()  # verify everything

solver = pyo.SolverFactory('cbc')  # need to have cbc solver installed

result = solver.solve(model)

result.write()  # solver details

model.row_select.display() # output

Output:

# ----------------------------------------------------------
#   Solver Information
# ----------------------------------------------------------
Solver: 
- Status: ok
  User time: -1.0
  System time: 0.0
  Wallclock time: 0.0
  Termination condition: optimal
  Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
  Statistics: 
    Branch and bound: 
      Number of bounded subproblems: 0
      Number of created subproblems: 0
    Black box: 
      Number of iterations: 0
  Error rc: 0
  Time: 0.01792597770690918
# ----------------------------------------------------------
#   Solution Information
# ----------------------------------------------------------
Solution: 
- number of solutions: 0
  number of solutions displayed: 0
row_select : Size=5, Index=I
    Key : Lower : Value : Upper : Fixed : Stale : Domain
      0 :     0 :   1.0 :     1 : False : False : Boolean
      1 :     0 :   1.0 :     1 : False : False : Boolean
      2 :     0 :   0.0 :     1 : False : False : Boolean
      3 :     0 :   1.0 :     1 : False : False : Boolean
      4 :     0 :   1.0 :     1 : False : False : Boolean

A more stressful rendition with 100,000 rows:

# row selection Integer Program stress test
import pyomo.environ as pyo
import numpy as np

# make a large matrix 100,000 x 8
data1 = np.random.randint(0, 1000, size=(100_000, 8))
# inject "the right answer into 3 rows"
data1[42602]    = [8, 0, 0, 0, 0, 0, 0, 0 ]
data1[3]        = [0, 0, 0, 0, 4, 3, 2, 1 ]
data1[10986]    = [0, 7, 6, 5, 0, 0, 0, 0 ]

data_dict = {(i, j): data1[i][j] for i in range(len(data1)) for j in range(len(data1[0]))}

model = pyo.ConcreteModel()

# sets
model.I = pyo.Set(initialize=range(len(data1)))     # a simple row index
model.J = pyo.Set(initialize=range(len(data1[0])))  # a simple column index

# parameters
model.matrix = pyo.Param(model.I , model.J, initialize=data_dict)   # hold the sparse matrix of values
magic_sum = [8, 7, 6, 5, 4, 3, 2, 1 ]

# variables
model.row_select = pyo.Var(model.I, domain=pyo.Boolean) # row selection variable

# constraints
# ensure the columnar sum is at least the magic sum for all j
def min_sum(model, j):
    return sum(model.row_select[i] * model.matrix[(i, j)] for i in model.I) >= magic_sum[j] 
model.c1 = pyo.Constraint(model.J, rule=min_sum)

# objective function
# minimze the overage
def objective(model):
    delta = 0
    for j in model.J:
        delta += sum(model.row_select[i] * model.matrix[i, j] for i in model.I) - magic_sum[j] 

    return delta

model.OBJ = pyo.Objective(rule=objective)

solver = pyo.SolverFactory('cbc')

result = solver.solve(model)
result.write()


print('\n\n======== row selections =======')
for i in model.I:
    if model.row_select[i].value > 0:
        print (f'row {i} selected')

Output:

# ----------------------------------------------------------
#   Solver Information
# ----------------------------------------------------------
Solver: 
- Status: ok
  User time: -1.0
  System time: 2.18
  Wallclock time: 2.61
  Termination condition: optimal
  Termination message: Model was solved to optimality (subject to tolerances), and an optimal solution is available.
  Statistics: 
    Branch and bound: 
      Number of bounded subproblems: 0
      Number of created subproblems: 0
    Black box: 
      Number of iterations: 0
  Error rc: 0
  Time: 2.800779104232788
# ----------------------------------------------------------
#   Solution Information
# ----------------------------------------------------------
Solution: 
- number of solutions: 0
  number of solutions displayed: 0


======== row selections =======
row 3 selected
row 10986 selected
row 42602 selected

@ErwinKalvelagen That, sir, is an excellent question! I've assumed it is something in the generic interface between `pyomo` and the solver `cbc`. I haven't used Gurobi in a while, but I think I remember it did populate that part of the report. I usually rely on the "termination condition" part of the report. Probably worthy of a secondary question. — AirSquid, Jun 29 '20 at 15:06
It's a very efficient solution. Thank you very much, @ErwinKalvelagen, But I wonder is there a method to find all solutions? — progquester, Jul 01 '20 at 08:23
Thanking me is misplaced. This was the work of @JeffH. But yes: some solvers can produce all (or many) solutions. For large problems, enumerating all solutions may not be feasible (there may just be too many of them). — Erwin Kalvelagen, Jul 01 '20 at 08:31

score 0 · Answer 2 · answered Jun 28 '20 at 17:41

This one picks and not picks an element (recursivly). As soon as the tree is impossible to solve (no elements left or any target value negative) it will return false. In case the sum of the target is 0 a solution is found and returned in form of the picked elements.

Feel free to add time and memory complexity in the comments. Worst case should be 2^(n+1)

Please let me know how it performs on your 8/50 data.

const elements = [
  [1, 0, 0, 0],
  [1, 1, 0, 0],
  [1, 0, 1, 0],
  [1, 1, 1, 0],
  [1, 1, 1, 1]
];

const target = [4, 3, 2, 1];

let iterations = 0;

console.log(iter(elements, target, [], 0));
console.log(`Iterations: ${iterations}`);

function iter(elements, target, picked, index) {
  iterations++;
  const sum = target.reduce(function(element, sum) {
    return sum + element;
  });

  if (sum === 0) return picked;
  if (elements.length === 0) return false;

  const result = iter(
    removeElement(elements, 0),
    target,
    picked,
    index + 1
  );

  if (result !== false) return result;

  const newTarget = matrixSubtract(target, elements[0]);
  const hasNegatives = newTarget.some(function(element) {
    return element < 0;
  });

  if (hasNegatives) return false;

  return iter(
    removeElement(elements, 0),
    newTarget,
    picked.concat(index),
    index + 1
  );
}

function removeElement(target, i) {
  return target.slice(0, i).concat(target.slice(i + 1));
}

function matrixSubtract(minuend, subtrahend) {
  let i = 0;
  return minuend.map(function(element) {
    return minuend[i] - subtrahend[i++]
  });
}

Maybe the complexity of this algrithom for my problem is too high. The recursion can be treated as try a binary number from [0,0,0,...0,0,0] to [1,1,1,...1,1,1]. I my problem,In the worst-case scenario, it will try 2^50. It is too bad.... — progquester, Jul 01 '20 at 02:02

Algorithm to find some rows from a matrix, whose sum is equal to a given row

2 Answers2

Output:

A more stressful rendition with 100,000 rows:

Output: