Some remarks:
Intro
Question: Reproducible code
- Question-setup is lazy: you could have added imports and csv-reading instead of making us replicate it...
- Easy to do experiments on different data now due to different parsing...
Task + Approach
- This looks like a combination of:
- general-purpose math-opt solver
- machine-learning scale data
- = often hits limits!
Solvers: Expectations
- SCS is first-order based and expected to be less robust compared to ECOS or other high-order methods
Your Model
Observations
- Solver: SCS (default opts)
- Fails / Runs havoc according to the progress of residuals (probably due no numerical-issues)
- Looks more crazy on my side
- Solver: ECOS (default opts)
- Fails (primal infeasible due to numerical-issues)
Analysis
- You won't be able to solve these numerical-issues by increasing iteration-limits only
- More complex tuning of feasibility-tolerances and co. would be needed!
Transformation / Fixing
Can we make those solvers work? I think so.
Instead of minimizing sum of squares, let's minimize the l2-norm instead. This is equivalent in regards to our solution and we might square the objective-value if we are interested in this value anyway!
This is motivated by this:
One particular reformulation that we strongly encourage is to eliminate quadratic forms—that is, functions like sum_square, sum(square(.)) or quad_form—whenever it is possible to construct equivalent models using norm instead. Our experience tells us that quadratic forms often pose a numerical challenge for the underlying solvers that CVX uses.
We acknowledge that this advice goes against conventional wisdom: quadratic forms are the prototypical smooth convex function, while norms are nonsmooth and therefore unwieldy. But with the conic solvers that CVX uses, this wisdom is exactly backwards. It is the norm that is best suited for conic formulation and solution. Quadratic forms are handled by converting them to a conic form—using norms, in fact! This conversion process poses some interesting scaling challenges. It is better if the modeler can eliminate the need to perform this conversion.
Code
import pandas as pd
import cvxpy as cp
import numpy as np
A = pd.read_csv('A_matrix.csv').to_numpy()
b = pd.read_csv('b_vector.csv').to_numpy().ravel()
x = cp.Variable(61)
prob = cp.Problem(cp.Minimize(cp.norm(A @ x - b)), [x >= 0, cp.sum(x) == 1])
result = prob.solve(solver = cp.SCS, verbose = True)
print("optimal value: ", prob.value)
print("cvxpy solution:")
print(x.value, np.sum(x.value))
Output solver=cp.SCS (slow CPU)
Valid solver-state + slow + solution looks not robust enough -> fluctuating around 0 symmetrically -> large primal-feas error in regards to x=>0
Could probably be improved by tunings, but it's probably better to use a different solver here! Not much analysis done here.
----------------------------------------------------------------------------
SCS v2.1.2 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2446295
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 62, constraints m = 278543
Cones: primal zero / dual free vars: 1
linear vars: 61
soc vars: 278481, soc blks: 1
Setup time: 1.63e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 9.14e+18 1.39e+20 1.00e+00 -5.16e+20 6.20e+23 6.04e+23 1.28e-01
100| 8.63e-01 1.90e+02 7.79e-01 5.96e+02 4.81e+03 1.17e-14 1.04e+01
200| 5.09e-02 3.50e+02 1.00e+00 6.20e+03 -1.16e+02 5.88e-15 2.08e+01
300| 3.00e-01 3.71e+03 7.64e-01 9.62e+03 7.19e+04 4.05e-15 3.17e+01
400| 5.19e-02 1.76e+02 1.91e-01 4.71e+03 6.94e+03 3.87e-15 4.25e+01
500| 4.60e-02 2.66e+02 2.83e-01 5.70e+03 1.02e+04 6.48e-15 5.25e+01
600| 5.13e-03 1.08e+02 1.24e-01 5.80e+03 7.44e+03 1.72e-14 6.23e+01
700| 3.35e-03 6.81e+01 9.64e-02 5.39e+03 4.44e+03 5.94e-15 7.15e+01
800| 1.62e-02 8.52e+01 1.17e-01 5.51e+03 6.97e+03 3.96e-15 8.07e+01
900| 1.93e-02 1.57e+01 1.89e-02 5.58e+03 5.38e+03 5.04e-15 8.98e+01
1000| 6.94e-03 6.85e+00 7.97e-03 5.57e+03 5.48e+03 1.75e-15 9.91e+01
1100| 4.64e-03 7.64e+00 1.42e-02 5.66e+03 5.50e+03 1.91e-15 1.09e+02
1200| 2.25e-04 3.25e-01 4.00e-04 5.61e+03 5.60e+03 5.33e-15 1.18e+02
1300| 4.73e-05 9.05e-02 5.78e-05 5.60e+03 5.60e+03 6.16e-15 1.28e+02
1400| 6.27e-07 4.58e-03 3.22e-06 5.60e+03 5.60e+03 7.17e-15 1.36e+02
1500| 2.02e-07 5.27e-05 4.58e-08 5.60e+03 5.60e+03 5.61e-15 1.46e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.46e+02s
Lin-sys: nnz in L factor: 2726730, avg solve time: 2.54e-02s
Cones: avg projection time: 1.16e-03s
Acceleration: avg step time: 5.61e-02s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 7.7307e-12, dist(y, K*) = 0.0000e+00, s'y/|s||y| = 3.0820e-18
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.0159e-07
dual res: |A'y + c|_2 / (1 + |c|_2) = 5.2702e-05
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5764e-08
----------------------------------------------------------------------------
c'x = 5602.9367, -b'y = 5602.9362
============================================================================
optimal value: 5602.936687635506
cvxpy solution:
[ 1.33143619e-06 -5.20173272e-07 -5.63980428e-08 -9.44340768e-08
6.07765135e-07 7.55998810e-07 8.45038786e-07 2.65626921e-06
-1.35669263e-07 -4.88286704e-07 -1.09285233e-06 8.63799377e-07
2.85145903e-07 -1.22240651e-06 2.14526505e-07 -2.40179173e-06
-1.75042884e-07 -1.27680170e-06 -1.40486649e-06 -1.12113037e-06
-2.26601198e-07 1.39878723e-07 -3.19396803e-06 -6.36480154e-07
2.16005860e-05 1.18205616e-06 2.15620316e-06 -1.94093348e-07
-1.88356275e-06 -7.07687270e-06 -1.99902966e-06 -2.28894738e-06
1.00000188e+00 -9.95601469e-07 -1.26333877e-06 1.26336565e-06
-5.31474195e-08 -9.81111443e-07 2.22755569e-07 -7.49418940e-07
-4.77882668e-07 6.89785690e-07 -2.46822613e-06 -5.73596077e-08
5.99307819e-07 -2.57301316e-07 -7.59150986e-07 -1.23753681e-08
-1.39938273e-06 1.48526305e-06 -2.41075790e-06 -3.50224485e-07
1.79214177e-08 6.71852182e-07 -5.10880844e-06 2.44821668e-07
-2.88655782e-06 -2.45457029e-07 -4.97712502e-07 -1.44497848e-06
-2.22294748e-07] 0.9999895863519757
Output solver=cp.ECOS (slow CPU)
Valid solver-state + much faster + solution looks ok
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +7e+04 9e-01 1e-04 1e+00 1e+03 --- --- 1 2 - | - -
1 -5.108e+01 -4.292e+01 +7e+04 6e-01 1e-04 9e+00 1e+03 0.0218 3e-01 4 5 4 | 0 0
2 +2.187e+02 +2.387e+02 +5e+04 6e-01 8e-05 2e+01 8e+02 0.3427 4e-02 4 5 5 | 0 0
3 +1.109e+03 +1.118e+03 +1e+04 3e-01 2e-05 9e+00 2e+02 0.7403 6e-03 4 5 5 | 0 0
4 +1.873e+03 +1.888e+03 +1e+04 2e-01 2e-05 1e+01 2e+02 0.2332 1e-01 5 6 6 | 0 0
5 +3.534e+03 +3.565e+03 +4e+03 8e-02 8e-06 3e+01 7e+01 0.7060 2e-01 5 6 6 | 0 0
6 +5.452e+03 +5.453e+03 +2e+02 2e-03 2e-07 1e+00 3e+00 0.9752 2e-03 6 8 8 | 0 0
7 +5.584e+03 +5.585e+03 +4e+01 4e-04 4e-08 4e-01 7e-01 0.8069 6e-02 2 2 2 | 0 0
8 +5.602e+03 +5.602e+03 +5e+00 5e-05 6e-09 8e-02 9e-02 0.9250 5e-02 2 2 2 | 0 0
9 +5.603e+03 +5.603e+03 +1e-01 1e-06 1e-10 2e-03 2e-03 0.9798 2e-03 5 5 5 | 0 0
10 +5.603e+03 +5.603e+03 +6e-03 4e-07 6e-12 9e-05 1e-04 0.9498 3e-04 5 5 5 | 0 0
11 +5.603e+03 +5.603e+03 +4e-04 4e-07 3e-13 7e-06 6e-06 0.9890 4e-02 1 2 2 | 0 0
12 +5.603e+03 +5.603e+03 +1e-05 4e-08 8e-15 2e-07 2e-07 0.9816 8e-03 1 2 2 | 0 0
13 +5.603e+03 +5.603e+03 +2e-07 7e-10 1e-16 2e-09 2e-09 0.9890 1e-04 5 3 4 | 0 0
OPTIMAL (within feastol=7.0e-10, reltol=2.7e-11, abstol=1.5e-07).
Runtime: 18.727676 seconds.
optimal value: 5602.936884707248
cvxpy solution:
[7.47985848e-11 3.58238148e-11 4.53994101e-11 3.73056632e-11
3.47224797e-11 3.62895261e-11 3.59367993e-11 4.03642466e-11
3.58643375e-11 3.24886989e-11 3.25080912e-11 3.34866983e-11
3.66296670e-11 3.89612422e-11 3.54489152e-11 7.07301971e-11
3.95949165e-11 3.68235605e-11 3.05918372e-11 3.43890675e-11
3.71817538e-11 3.62561876e-11 3.55281653e-11 3.55800928e-11
4.10876077e-11 4.12877804e-11 4.11174782e-11 3.35519296e-11
3.43716575e-11 3.56588133e-11 3.66118962e-11 3.68789703e-11
9.99999998e-01 3.34857869e-11 3.21984616e-11 5.82577263e-11
2.85751155e-11 3.64710243e-11 3.59930485e-11 5.04742702e-11
3.07026084e-11 3.36507487e-11 4.19786324e-11 8.35032700e-11
3.33575857e-11 3.42732986e-11 3.70599423e-11 4.73856413e-11
3.39708564e-11 3.64354428e-11 2.95022064e-11 3.46315519e-11
3.04124702e-11 4.07870093e-11 3.57782184e-11 3.71824186e-11
3.72394185e-11 4.48194963e-11 4.09635820e-11 6.45638394e-11
4.00297122e-11] 0.9999999999673748
Outro
Final remarks
- Above re-formulation might be enough to help you solve your problem
- ECOS beating SCS on this large-scale problem is unintuitive, but can alway happen and i won't analyze it (SCS is still a great solver, but ECOS is too despite both being very different approaches! Open-source community should be happy to have those!)
- I don't think i would go for these solvers with ML-scale data if i got time to implement something more customized
- The easiest approach which comes to mind for large-scale solving here:
- Projected (Accelerated) Gradient (which would be a first-order method being very robust in regards to your constraints here!)
- "projection onto the probability simplex" (which you need) is a common (well-researched) thing
- Going by the final weights/coefficients: data looks strange!
- There seems to be a very dominating column (information-leakage); i don't know
- Rounding, both solvers will output the solution vector:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]