I need to find the cells of a contingency table given the marginal distributions.
2574 | 2572 | 3393 | 3768 | 3822 | b | e |
---|---|---|---|---|---|---|
x₁₁ | x₁₂ | x₁₃ | x₁₄ | x₁₅ | 187 | 23846753.74 |
x₂₁ | x₂₂ | x₂₃ | x₂₄ | x₂₅ | 3 | 324024.64 |
x₃₁ | x₃₂ | x₃₃ | x₃₄ | x₃₅ | 13755 | 1489591510.50 |
x₄₁ | x₄₂ | x₄₃ | x₄₄ | x₄₅ | 543 | 76173239.22 |
x₅₁ | x₅₂ | x₅₃ | x₅₄ | x₅₅ | 68 | 8188751.57 |
x₆₁ | x₆₂ | x₆₃ | x₆₄ | x₆₅ | 1332 | 172945247.86 |
x₇₁ | x₇₂ | x₇₃ | x₇₄ | x₇₅ | 361 | 41675606.70 |
xᵢⱼ are non-negative integers I want to find. The column sums are exact. The row sums (b) are approximate.
There are additional constraints to guide the distribution of the xᵢⱼ:
Given constant factors F = (13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778)T and the expected result e then X·F ≈ e
Is this a Mixed-Integer Quadratic Programming problem if I want to minimize the deviations of the approximate equalities?
- b are the column sums
- A is a vector of ones, so that every column gets summed up
- d are the row sums
- C is a vector of ones, such that every row gets summed up
- e is a vector of expected weighted sums
- F is a constant vector of weights
The optimization problem can be formulated as
- min (‖ X C – d ‖² + ‖ X F – e ‖²)
- such that A X = b
- x ≥ 0
- x ∈ ℕ
I’ve tried to solve this using cvxpy:
import cvxpy as cp
import numpy as np
F = np.array([13336.41847153, 102412.73466321, 41811.01724119, 78689.83110577, 282353.66682778])
e = [23846753.74, 324024.64, 1489591510.50, 76173239.22, 8188751.57, 172945247.86, 41675606.70]
d = [187., 3., 13755., 543., 68., 1332., 361.]
C = np.ones(len(F))
b = np.array([2574, 2572, 3393, 3768, 3822])
A = np.ones(len(d))
x = cp.Variable((len(e), len(b)), integer=True)
cost = cp.sum_squares(x @ C - d) + cp.sum_squares(x @ F - e)
objective = cp.Minimize(cost)
constraint_gt0 = x >= 0
constraint_eq = A @ x == b
problem = cp.Problem(objective, [constraint_gt0, constraint_eq])
solution = problem.solve()
But this results in the error message:
Either candidate conic solvers (['GLPK_MI', 'SCIPY']) do not support the cones output by the problem (SOC, NonNeg, Zero), or there are not enough constraints in the problem.
If I remove the integer=True
constraint the method completes without error but doesn’t find a solution.
There are many of these tables to solve, which are all independent of each other. So I need a solution in code, not the set of x for the particular given example.
My questions:
Is this a well-formed problem and solvable?
Is it under-constrained like the error message suggests?
Why does it result in a Second-Order Cone (SOC) as mentioned in the error message? That sounds unnecessarily complicated. I thought this is a “least squares problem with some extra constraints”.
How can I solve the problem with cvxpy or some other Python package?
If an approximate, non-integer solution is more feasible, how would I achieve that?