How to fill a contingency table from the marginal distributions given some optimization constraints?

Question

I need to find the cells of a contingency table given the marginal distributions.

2574	2572	3393	3768	3822	b	e
x₁₁	x₁₂	x₁₃	x₁₄	x₁₅	187	23846753.74
x₂₁	x₂₂	x₂₃	x₂₄	x₂₅	3	324024.64
x₃₁	x₃₂	x₃₃	x₃₄	x₃₅	13755	1489591510.50
x₄₁	x₄₂	x₄₃	x₄₄	x₄₅	543	76173239.22
x₅₁	x₅₂	x₅₃	x₅₄	x₅₅	68	8188751.57
x₆₁	x₆₂	x₆₃	x₆₄	x₆₅	1332	172945247.86
x₇₁	x₇₂	x₇₃	x₇₄	x₇₅	361	41675606.70

xᵢⱼ are non-negative integers I want to find. The column sums are exact. The row sums (b) are approximate.

There are additional constraints to guide the distribution of the xᵢⱼ:

Given constant factors F = (13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778)^T and the expected result e then X·F ≈ e

Is this a Mixed-Integer Quadratic Programming problem if I want to minimize the deviations of the approximate equalities?

b are the column sums
A is a vector of ones, so that every column gets summed up
d are the row sums
C is a vector of ones, such that every row gets summed up
e is a vector of expected weighted sums
F is a constant vector of weights

The optimization problem can be formulated as

min (‖ X C – d ‖² + ‖ X F – e ‖²)
such that A X = b
x ≥ 0
x ∈ ℕ

I’ve tried to solve this using cvxpy:

import cvxpy as cp
import numpy as np

F = np.array([13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778])
e = [23846753.74, 324024.64, 1489591510.50, 76173239.22, 8188751.57, 172945247.86, 41675606.70]
d = [187., 3., 13755., 543., 68., 1332., 361.]
C = np.ones(len(F))
b = np.array([2574, 2572, 3393, 3768, 3822])
A = np.ones(len(d))

x = cp.Variable((len(e), len(b)), integer=True)
cost = cp.sum_squares(x @ C - d) + cp.sum_squares(x @ F - e)
objective = cp.Minimize(cost)
constraint_gt0 = x >= 0
constraint_eq = A @ x == b
    
problem = cp.Problem(objective, [constraint_gt0, constraint_eq])
solution = problem.solve()

But this results in the error message:

Either candidate conic solvers (['GLPK_MI', 'SCIPY']) do not support the cones output by the problem (SOC, NonNeg, Zero), or there are not enough constraints in the problem.

If I remove the integer=True constraint the method completes without error but doesn’t find a solution.

There are many of these tables to solve, which are all independent of each other. So I need a solution in code, not the set of x for the particular given example.

My questions:

Is this a well-formed problem and solvable?

Is it under-constrained like the error message suggests?

Why does it result in a Second-Order Cone (SOC) as mentioned in the error message? That sounds unnecessarily complicated. I thought this is a “least squares problem with some extra constraints”.

How can I solve the problem with cvxpy or some other Python package?

If an approximate, non-integer solution is more feasible, how would I achieve that?

This is called a **matrix-balancing** problem. Try using Cplex as solver (the freebie version should suffice). — Erwin Kalvelagen, Jan 25 '23 at 01:54
The quadratic part is translated to a SOC internally for conic solvers and to a QP for QP solvers. You need a solver which supports one of that combined with mixed-integer from the table in https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver — Michal Adamaszek, Jan 25 '23 at 08:01

score 0 · Answer 1 · answered Jan 26 '23 at 14:55

Thanks to the hints of @ErwinKalvelagen and @MichalAdamaszek: the answer is CPLEX. You need a solver that is capable of solving this class of problems. See Choosing a solver in the cvxpy documentation.

As an alternative if you don’t want to use CPLEX, by replacing cp.sum_squares with cp.norm in the cost function and removing the integer=True constraint, the integrated solvers are able to find a floating point solution, too.

How to fill a contingency table from the marginal distributions given some optimization constraints?

1 Answers1