1

I need to find the cells of a contingency table given the marginal distributions.

2574 2572 3393 3768 3822 b e
x₁₁ x₁₂ x₁₃ x₁₄ x₁₅ 187 23846753.74
x₂₁ x₂₂ x₂₃ x₂₄ x₂₅ 3 324024.64
x₃₁ x₃₂ x₃₃ x₃₄ x₃₅ 13755 1489591510.50
x₄₁ x₄₂ x₄₃ x₄₄ x₄₅ 543 76173239.22
x₅₁ x₅₂ x₅₃ x₅₄ x₅₅ 68 8188751.57
x₆₁ x₆₂ x₆₃ x₆₄ x₆₅ 1332 172945247.86
x₇₁ x₇₂ x₇₃ x₇₄ x₇₅ 361 41675606.70

xᵢⱼ are non-negative integers I want to find. The column sums are exact. The row sums (b) are approximate.

There are additional constraints to guide the distribution of the xᵢⱼ:

Given constant factors F = (13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778)T and the expected result e then X·F ≈ e

Is this a Mixed-Integer Quadratic Programming problem if I want to minimize the deviations of the approximate equalities?

  • b are the column sums
  • A is a vector of ones, so that every column gets summed up
  • d are the row sums
  • C is a vector of ones, such that every row gets summed up
  • e is a vector of expected weighted sums
  • F is a constant vector of weights

The optimization problem can be formulated as

  • min (‖ X C – d ‖² + ‖ X F – e ‖²)
  • such that A X = b
  • x ≥ 0
  • x ∈ ℕ

I’ve tried to solve this using cvxpy:

import cvxpy as cp
import numpy as np

F = np.array([13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778])
e = [23846753.74, 324024.64, 1489591510.50, 76173239.22, 8188751.57, 172945247.86, 41675606.70]
d = [187., 3., 13755., 543., 68., 1332., 361.]
C = np.ones(len(F))
b = np.array([2574, 2572, 3393, 3768, 3822])
A = np.ones(len(d))

x = cp.Variable((len(e), len(b)), integer=True)
cost = cp.sum_squares(x @ C - d) + cp.sum_squares(x @ F - e)
objective = cp.Minimize(cost)
constraint_gt0 = x >= 0
constraint_eq = A @ x == b
    
problem = cp.Problem(objective, [constraint_gt0, constraint_eq])
solution = problem.solve()

But this results in the error message:

Either candidate conic solvers (['GLPK_MI', 'SCIPY']) do not support the cones output by the problem (SOC, NonNeg, Zero), or there are not enough constraints in the problem.

If I remove the integer=True constraint the method completes without error but doesn’t find a solution.

There are many of these tables to solve, which are all independent of each other. So I need a solution in code, not the set of x for the particular given example.

My questions:

Is this a well-formed problem and solvable?

Is it under-constrained like the error message suggests?

Why does it result in a Second-Order Cone (SOC) as mentioned in the error message? That sounds unnecessarily complicated. I thought this is a “least squares problem with some extra constraints”.

How can I solve the problem with cvxpy or some other Python package?

If an approximate, non-integer solution is more feasible, how would I achieve that?

hfs
  • 2,433
  • 24
  • 37
  • 1
    This is called a **matrix-balancing** problem. Try using Cplex as solver (the freebie version should suffice). – Erwin Kalvelagen Jan 25 '23 at 01:54
  • 1
    The quadratic part is translated to a SOC internally for conic solvers and to a QP for QP solvers. You need a solver which supports one of that combined with mixed-integer from the table in https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver – Michal Adamaszek Jan 25 '23 at 08:01

1 Answers1

0

Thanks to the hints of @ErwinKalvelagen and @MichalAdamaszek: the answer is CPLEX. You need a solver that is capable of solving this class of problems. See Choosing a solver in the cvxpy documentation.

As an alternative if you don’t want to use CPLEX, by replacing cp.sum_squares with cp.norm in the cost function and removing the integer=True constraint, the integrated solvers are able to find a floating point solution, too.

hfs
  • 2,433
  • 24
  • 37