2

I am trying to use cvxpy lib to solve a very simple least square problem. But I found that cvxpy gave me very different results when I use sum_squares and norm(x,2) as loss functions. The same happens when I try the l1 norm and sum of absolute values.

Does these differences come from mathematically the optimization problem definition, or the implementation of the library?

Here is my code example:

s = cvx.Variable(n)

constrains = [cvx.sum(s)==1 , s>=0, s<=1] 

prob = cvx.Problem(cvx.Minimize(cvx.sum_squares(A * s - y)), constrains)

prob = cvx.Problem(cvx.Minimize(cvx.norm(A*s - y ,2)), constrains)

Both y and s are vectors representing histograms. y is the histogram of randomized data, and s is the original histogram that I want to recover. A is a n*m "transition" matrix of probabilities how s is randomized to y.

The following are the histograms of variable s:

Using sum_of_square

Using norm2

Royi
  • 4,640
  • 6
  • 46
  • 64
skinfaxi
  • 21
  • 3
  • it would help if you share the results that you are getting along with the expected results. – codingbbq Dec 27 '18 at 18:24
  • @noocode I just updated the question. Do you have any clue of how this happens? – skinfaxi Dec 27 '18 at 19:16
  • Add some reproducible example and check the solver state. Hard to reason about your problem. But in the core sum of squares and norm are two very different things. E.g. qp vs. Socp optimization problem. Depending on what cvxpy does, in some extreme case it might even chose a different solver (as different opt problems) although i doubt it does. I would expect the norm-prob to be the better/more stable formulation. – sascha Dec 27 '18 at 21:07
  • I'm not sure but CVXR has some interesting material on the subject: http://cvxr.com/cvx/doc/advanced.html#eliminating-quadratic-forms I'd be interested if someone can interpret that into an answer. – Jacques Kvam Dec 29 '18 at 22:50

1 Answers1

1

Let's examine both problem using the Lagrangian of the problem:

$$\begin{align*} {\left\| A x - b \right\|}{2} + \lambda R \left( x \right ) \tag{1} \ {\left\| A x - b \right\|}{2}^{2} + \lambda R \left( x \right ) \tag{2} \end{align*}$$

enter image description here

Clearly, for the same value of $ \lambda $ in (1) and (2) the result will be different as the value of the fidelity term ($ A x - b $) will be different.

There is nothing wrong with the code or the solver, it is just a different model.

Royi
  • 4,640
  • 6
  • 46
  • 64