Is my problem suited for convex optimization, and if so, how to express it with cvxpy?

Question

I have an array of scalars of m rows and n columns. I have a Variable(m) and a Variable(n) that I would like to find solutions for.

The two variables represent values that need to be broadcast over the columns and rows respectively.

I was naively thinking of writing the variables as Variable((m, 1)) and Variable((1, n)), and adding them together as if they're ndarrays. However, that doesn't work, as broadcasting is not allowed.

import cvxpy as cp
import numpy as np

# Problem data.
m = 3
n = 4
np.random.seed(1)
data = np.random.randn(m, n)

# Construct the problem.
x = cp.Variable((m, 1))
y = cp.Variable((1, n))

objective = cp.Minimize(cp.sum(cp.abs(x + y + data)))
# or:
#objective = cp.Minimize(cp.sum_squares(x + y + data))

prob = cp.Problem(objective)

result = prob.solve()
print(x.value)
print(y.value)

This fails on the x + y expression: ValueError: Cannot broadcast dimensions (3, 1) (1, 4).

Now I'm wondering two things:

Is my problem indeed solvable using convex optimization?
If yes, how can I express it in a way that cvxpy understands?

I'm very new to the concept of convex optimization, as well as cvxpy, and I hope I described my problem well enough.

This is a fairly straightforward linear regression, where you are fitting a collection of observations (`data`) against a bunch of categorical explanatory variables (one for each row and one for each column), with no fixed offset (intercept). You want to calculate the coefficients for each variable that minimize absolute error of the prediction. This is convex, so `cvxpy` should work. Or you might try a regression package. If you’re willing to minimize mean squared error instead, you would have a standard regression that you could write in closed form (set first derivative equal to zero). — Matthias Fripp, Jun 08 '19 at 19:25
You’re welcome. I can also show you how to write this up as a linear program to solve in PuLP or Pyomo if you like. The trick to that is to define “up” and “down” error variables for each data point, then require that data + up - down = prediction, then minimize the sum of all the error variables. That will be exactly equivalent to the problem you’ve shown, and should solve pretty quickly. — Matthias Fripp, Jun 09 '19 at 06:40
That would be extremely helpful indeed, if you would want to do that. I'm not tied to cvxpy at all, so if either of the libraries you mention are more suitable for the task, I'd be glad to learn. I come from a non-math, non-data-science background, and I'm struggling with the terminology and concepts. — Just van Rossum, Jun 09 '19 at 07:11
The difference betweens cvxpy's automatic linearization and a manual LP model linearizing the abs-function will be neglible (and more or less the same problem is passed to the solver). The core problem is the same: don't tackle a dense optimization-problem with sparse-optimizers (all "good" LP solvers). If pulp or pyomo is faster, than it's solely due to a different solver being used; as cvxpy (in your case probably; depend on specifics) defaults to use an open-source interior-point solver designed to solve more general problems. But LP-solvers can be used too (still with those caveats). — sascha, Jun 09 '19 at 11:41
I now realize I need to learn more about the subject before I can even state my problem in a decent way. My data matrix will actually usually be sparse, so I indeed need to look for a solution that doesn't take away that advantage. One more attempt to describe the actual problem: For a (sparse) `m` by `n` data matrix, indexed by `i` an `j`, I need to find the best `x[m]` and `y[n]` solutions that approximate the data like this: `x[i] + y[j] ≈ data[i, j]`. — Just van Rossum, Jun 10 '19 at 08:22

Matthias Fripp · Answer 1 · 2019-06-11T08:32:45.910

I offered to show you how to represent this as a linear program, so here it goes. I'm using Pyomo, since I'm more familiar with that, but you could do something similar in PuLP.

To run this, you will need to first install Pyomo and a linear program solver like glpk. glpk should work for reasonable-sized problems, but if you are finding it's taking too long to solve, you could try a (much faster) commercial solver like CPLEX or Gurobi.

You can install Pyomo via pip install pyomo or conda install -c conda-forge pyomo. You can install glpk from https://www.gnu.org/software/glpk/ or via conda install glpk. (I think PuLP comes with a version of glpk built-in, so that might save you a step.)

Here's the script. Note that this calculates absolute error as a linear expression by defining one variable for the positive component of the error and another for the negative part. Then it seeks to minimize the sum of both. In this case, the solver will always set one to zero since that's an easy way to reduce the error, and then the other will be equal to the absolute error.

import random
import pyomo.environ as po

random.seed(1)

# ~50% sparse data set, big enough to populate every row and column
m = 10   # number of rows
n = 10   # number of cols
data = {
    (r, c): random.random()
    for r in range(m)
    for c in range(n)
    if random.random() >= 0.5
}

# define a linear program to find vectors
# x in R^m, y in R^n, such that x[r] + y[c] is close to data[r, c]

# create an optimization model object
model = po.ConcreteModel()

# create indexes for the rows and columns
model.ROWS = po.Set(initialize=range(m))
model.COLS = po.Set(initialize=range(n))

# create indexes for the dataset
model.DATAPOINTS = po.Set(dimen=2, initialize=data.keys())

# data values
model.data = po.Param(model.DATAPOINTS, initialize=data)

# create the x and y vectors
model.X = po.Var(model.ROWS, within=po.NonNegativeReals)
model.Y = po.Var(model.COLS, within=po.NonNegativeReals)

# create dummy variables to represent errors
model.ErrUp = po.Var(model.DATAPOINTS, within=po.NonNegativeReals)
model.ErrDown = po.Var(model.DATAPOINTS, within=po.NonNegativeReals)

# Force the error variables to match the error
def Calculate_Error_rule(model, r, c):
    pred = model.X[r] + model.Y[c]
    err = model.ErrUp[r, c] - model.ErrDown[r, c]
    return (model.data[r, c] + err == pred)
model.Calculate_Error = po.Constraint(
    model.DATAPOINTS, rule=Calculate_Error_rule
)

# Minimize the total error
def ClosestMatch_rule(model):
    return sum(
        model.ErrUp[r, c] + model.ErrDown[r, c]
        for (r, c) in model.DATAPOINTS
    )
model.ClosestMatch = po.Objective(
    rule=ClosestMatch_rule, sense=po.minimize
)

# Solve the model

# get a solver object
opt = po.SolverFactory("glpk")
# solve the model
# turn off "tee" if you want less verbose output
results = opt.solve(model, tee=True)

# show solution status
print(results)

# show verbose description of the model
model.pprint()

# show X and Y values in the solution
for r in model.ROWS:
    print('X[{}]: {}'.format(r, po.value(model.X[r])))
for c in model.COLS:
    print('Y[{}]: {}'.format(c, po.value(model.Y[c])))

Just to complete the story, here's a solution that's closer to your original example. It uses cvxpy, but with the sparse data approach from my solution.

I don't know the "official" way to do elementwise calculations with cvxpy, but it seems to work OK to just use the standard Python sum function with a lot of individual cp.abs(...) calculations.

This gives a solution that is very slightly worse than the linear program, but you may be able to fix that by adjusting the solution tolerance.

import cvxpy as cp
import random

random.seed(1)

# Problem data.
# ~50% sparse data set
m = 10   # number of rows
n = 10   # number of cols
data = {
    (i, j): random.random()
    for i in range(m)
    for j in range(n)
    if random.random() >= 0.5
}

# Construct the problem.
x = cp.Variable(m)
y = cp.Variable(n)

objective = cp.Minimize(
    sum(
        cp.abs(x[i] + y[j] + data[i, j])
        for (i, j) in data.keys()
    )
)

prob = cp.Problem(objective)

result = prob.solve()
print(x.value)
print(y.value)

Thank you very much, that's very generous of you. Works like a charm! I will need some time to study it to better understand what's going on. — Just van Rossum, Jun 11 '19 at 06:00
Just added a solution using `cvxpy`. It's pretty similar to your original approach but works with sparse data and avoids the broadcasting errors. It's also considerably simpler than the `pyomo` approach. — Matthias Fripp, Jun 11 '19 at 08:27
That's an awesome demo, too. Thank you so much, that's super helpful. — Just van Rossum, Jun 11 '19 at 14:32

sascha · Answer 2 · 2019-06-08T18:29:32.330

I did not get the idea, but just some hacky stuff based on the assumption:

you want some cvxpy-equivalent to numpy's broadcasting-rules behaviour on arrays (m, 1) + (1, n)

So numpy-wise:

m = 3
n = 4
np.random.seed(1)

a = np.random.randn(m, 1)
b = np.random.randn(1, n)

a
array([[ 1.62434536],
   [-0.61175641],
   [-0.52817175]])

b
array([[-1.07296862,  0.86540763, -2.3015387 ,  1.74481176]])


a + b
array([[ 0.55137674,  2.48975299, -0.67719333,  3.36915713],
   [-1.68472504,  0.25365122, -2.91329511,  1.13305535],
   [-1.60114037,  0.33723588, -2.82971045,  1.21664001]])

Let's mimic this with np.kron, which has a cvxpy-equivalent:

aLifted = np.kron(np.ones((1,n)), a)
bLifted = np.kron(np.ones((m,1)), b)

aLifted
array([[ 1.62434536,  1.62434536,  1.62434536,  1.62434536],
   [-0.61175641, -0.61175641, -0.61175641, -0.61175641],
   [-0.52817175, -0.52817175, -0.52817175, -0.52817175]])

bLifted
array([[-1.07296862,  0.86540763, -2.3015387 ,  1.74481176],
   [-1.07296862,  0.86540763, -2.3015387 ,  1.74481176],
   [-1.07296862,  0.86540763, -2.3015387 ,  1.74481176]])

aLifted + bLifted
array([[ 0.55137674,  2.48975299, -0.67719333,  3.36915713],
   [-1.68472504,  0.25365122, -2.91329511,  1.13305535],
   [-1.60114037,  0.33723588, -2.82971045,  1.21664001]])

Let's check cvxpy semi-blindly (we only dimensions; too lazy to setup a problem and fix variable to check the output :-D):

import cvxpy as cp
x = cp.Variable((m, 1))
y = cp.Variable((1, n))

cp.kron(np.ones((1,n)), x) + cp.kron(np.ones((m, 1)), y)
# Expression(AFFINE, UNKNOWN, (3, 4))

# looks good!

Now some caveats:

i don't know how efficient cvxpy can reason about this matrix-form internally
- unclear if more efficient as a simple list-comprehension based form using cp.vstack and co (it probably is)
this operation itself kills all sparsity
- (if both vectors are dense; your matrix is dense)
- cvxpy and more or less all convex-optimization solvers are based on some sparsity assumption
  - scaling this problem up to machine-learning dimensions will not make you happy
there is probably a much more concise mathematical theory for your problem then to use (sparsity-assuming) (pretty) general (DCP implemented in cvxpy is a subset) convex-optimization

Thank you very much! Both the `kron` and `vstack`/`hstack` solutions work so far. And indeed it doesn't scale up well to values of m and n in the hundreds. Ideally I want to to work with sizes up to thousands. I will try to get a more mathematically sound definition of the problem. — Just van Rossum, Jun 09 '19 at 05:59

Is my problem suited for convex optimization, and if so, how to express it with cvxpy?

2 Answers2