2

I have a problem that I wonder if I can solve using cvxpy:

The problem: I have a two dimensional integers array and I want to split it to two array in a way that each row of the source array is either in the 1st or 2nd array.

The requirement from these arrays us that for each column, the sum of integers in array #1 will be as close as possible to twice the sum of integers in array #2.

Example: Consider the input array:

[
  [1, 2, 3, 4],
  [4, 6, 2, 5],
  [3, 9, 1, 2],
  [8, 1, 0, 9],
  [8, 4, 0, 5],
  [9, 8, 0, 4]
]

The sums of its columns is [33, 30, 6, 29] so ideally we are looking for 2 arrays that the sums of their columns will be:

  • Array #1: [22, 20, 4, 19]
  • Array #2: [11, 10, 2, 10]

Off course this is not always possible but I looking for the best solution for this problem.

A possible solution for this specific example might be:

  • Array #1:
[
  [1, 2, 3, 4],
  [4, 6, 2, 5],
  [8, 4, 0, 5],
  [9, 8, 0, 4]
]

With column sums: [22, 20, 5, 18]

  • Array #2:
[
  [3, 9, 1, 2],
  [8, 1, 0, 9],
]

With column sums: [11, 10, 1, 11]

Any suggestions?

1 Answers1

1

You can use a boolean vector variable to select rows. The only thing left to decide is how much to penalize errors. In this case I just used the norm of the difference vector.

import cvxpy as cp
import numpy as np
data = np.array([
  [1, 2, 3, 4],
  [4, 6, 2, 5],
  [3, 9, 1, 2],
  [8, 1, 0, 9],
  [8, 4, 0, 5],
  [9, 8, 0, 4]
])
x = cp.Variable(data.shape[0], boolean=True)
prob = cp.Problem(cp.Minimize(cp.norm((x - 2 * (1 - x)) * data)))
prob.solve()
A = np.round(x.value) @ data
B = np.round(1 - x.value) @ data

A and B are the sum of rows.

(array([21., 20.,  4., 19.]), array([12., 10.,  2., 10.]))
Jacques Kvam
  • 2,856
  • 1
  • 26
  • 31
  • Thank you very much for your answer. Would it be possible to explain your solution to a person that does not "speak" the specific mathematical language? I understand that `data` holds my initial array and that `x` is a boolean array that its value will represent the selected rows What i don't understand is: 1) what is the meaning of `cp.norm((x - 2 * (1 - x)) * data`. I looked for documentation on cvxpy.norm but could not find anything 2) what is the meaning of the actual values in `x.value` after the problem is solved? – yonathan livny Aug 20 '19 at 07:41
  • 1. `m = x * data` sums the first set of vectors. `n = (1-x) * data` sums the second set of vectors. I want m = 2*n but that may not be possible so I compute `m-2*n`. Then I take the norm which is the sqrt of the sum of squares of the vector. The closer this is to zero the "better" the solution is. 2. x.value is the boolean array to select values for the first set of vectors. Then to get the second set, you can just do `1-x.value` :) I hope that helps. – Jacques Kvam Aug 22 '19 at 05:15