Fastest possible optimisation of an area difference with a constrained sum

Question

I have four arrays, a₁,a₂,a₃,a₄, each of length 500. I have a target array a_t, also of length 500. These arrays each represent the y coordinates of unevenly spaced points on a graph. I have the x coordinates in a separate array.

I want to optimise the coefficients c₁,c₂,c₃,c₄ such that the area difference between c₁x₁ + c₂x₂ + c₃x₃ + c₄x₄ and a_t is minimised. The coefficients must sum to 1 and be between 0 and 1 inclusive.

I currently do this using scipy.optimize.minimize. Using scipy.optimize.LinearConstraints, I constrain my target variable (an array [c₁,c₂,c₃,c₄]) to sum to 1. The loss function multiplies the target variable by the arrays a₁,a₂,a₃,a₄, and then finds the absolute difference between this sum and a_t. This absolute difference is integrated between each consecutive pair of x coordinates, and the sum of these integrals is outputted.

Here is the code:

xpoint_diffs = xpoints[1:] - xpoints[:-1]
a1a2a3a4 = np.array([a1,a2,a3,a4])

def lossfunc(x):
   abs_diff = abs(at - (a1a2a3a4 * x).sum(axis = 1))
   return ((absdiff[:-1] + absdiff[1:])/2 * xpoint_diffs).sum()

Is there a faster way to do this?

I then pass my constraint and loss function into scipy.optimize.minimize.

Just a small optimization: You can pull the division by 2 in the last line into the ```xpoint_diffs```. Then you can rewrite it into into a single ```np.dot``` which should be faster: ```return np.dot(absdiff[:-1] + absdiff[1:], half_xpoint_diffs)``` — Homer512, Nov 29 '22 at 12:38
The ```(a1a2a3a4 * x).sum(axis = 1)``` can also be optimized into a single matrix vector product. However, I'm currently a bit confused about the shapes. ```a1a2a3a4``` should be shape ```(4, 500)``` and ```x``` shape ```(4,)```, right? But that does not broadcast. I guess ```x``` is ```(4, 1)```? — Homer512, Nov 29 '22 at 12:55
Using `abs` inside one of your functions makes your optimization problem non-differentiable and thus it violates the mathematical assumptions of most algorithms under the hood of scipy's `minimize`. Just square the expression on the right hand side of abs_diff. That said, the performance of solving a nonlinear optimization problem depends on: a) your initial guess b) your used algorithm c) the implementation of your functions d) how you calculate the derivatives (exact vs approximation by finite differences. Providing a full minimal **reproducible** example would make it easier for us to help. — joni, Nov 29 '22 at 12:56
@Homer512 that's a good point, thank you. I will turn them both into dot products. Yes, the shape of x is (4,1), you are correct. — milandeleev, Nov 29 '22 at 12:57
@joni This is essentially my example. I test this using numpy.random.rand matrices of (500,1) to produce ```a1,a2,a3,a4,at```. I don't select an algorithm, and the implementation of my functions is as above. I don't calculate any derivatives and my initial guess is super simple - just (0.25,0.25,0.25,0.25) in this case. — milandeleev, Nov 29 '22 at 13:00
But if ```a1a2a3a4``` is shape ```(4, 500)``` and ```x``` is shape ```(4, 1)```, then ```(a1a2a3a4 * x).sum(axis=1)``` is shape ```(4, )```. Don't you need shape ```(500, )``` to then integrate the individual line segments? — Homer512, Nov 29 '22 at 13:17
I suspect this is actually an LP (after linearizing the abs(.) function). Obviously, an LP solver should be the fastest for an LP problem. — Erwin Kalvelagen, Nov 29 '22 at 15:50
@ErwinKalvelagen how would I go about making this problem linear? I know it's convex, but the squared term is throwing me off. — milandeleev, Nov 30 '22 at 12:18
The loss function is piecewise linear in x (due to the abs(.)) but can be linearized using standard techniques. The result is a linear model (i.e. an LP). I don't see any squared term. I am looking at your lossfunc(x) function. — Erwin Kalvelagen, Dec 01 '22 at 01:28

Homer512 · Accepted Answer · 2022-11-29T19:26:01.230

This isn't a full solution, more like some thoughts that may lead to one. Also, someone please double-check my math.

Variables and test data

First, let's begin by defining some test data. While doing so, I transpose the a1a2a3a4 matrix because it will prove more conventient. Plus, I'm renaming it to a14 because I cannot be bothered writing the whole thing all the time, sorry.

import numpy as np
from numpy import linalg, random
from scipy import optimize

a14 = random.uniform(-1., 1., (500, 4)) * (0.25, 0.75, 1., 2.)
at = random.uniform(-1., 1., 500)
xpoints = np.cumsum(random.random(500))
initial_guess = np.full(4, 0.25)

Target

Okay, here is my first slight point of confusion: OP described the target as the "area difference" between two graphs. X coordinates are given by xpoints. Y coordinates of the target are given as at and we search coeffs so that a14 @ coeffs gives Y coordinates close to the target.

Now, when we talk "area under the graph", my first instinct is piecewise linear. Similar to the trapezoidal rule this gives us the formula sum{(y[i+1]+y[i]) * (x[i+1]-x[i])} / 2. We can reformulate that a bit to get something of the form (y[0]*(x[1]-x[0]) + y[1]*(x[2]-x[0]) + y[2]*(x[3]-x[1]) + ... + y[n-1]*(x[n-1]-x[n-2])) / 2. We want to precompute the x differences and scaling factor.

weights = 0.5 * np.concatenate(
         ((xpoints[1] - xpoints[0],),
         xpoints[2:] - xpoints[:-2],
         (xpoints[-1] - xpoints[-2],)))

These factors are different from the ones OP used. I guess they are integrating a piecewise constant function instead of piecewise linear? Anyway, switching between these two should be simple enough so I leave this here as an alternative.

Loss function

Adapting OP's loss function to my interpretation gives

def loss(coeffs):
    absdiff = abs(at - (a14 * coeffs).sum(axis=1))
    return (absdiff * weights).sum()

As I've noted in comments, this can be simplified by using matrix vector multiplications.

def loss(coeffs):
    return abs(a14 @ coeffs - at) @ weights

This is the same as abs((a14 * weights[:, np.newaxis]) @ coeffs - (at * weights)).sum() which in turn makes it obvious that we are talking about minimizing the L1 norm.

a14w = a14 * weights[:, np.newaxis]
atw = at * weights
def loss(coeffs):
    return linalg.norm(a14w @ coeffs - atw, 1)

This isn't a huge improvement as far as computing the loss function goes. However, it goes a long way towards pressing our problem into a regular pattern.

Approximate solution

As noted in comments, the abs() function in the L1 norm is poison for optimization because it is non-differentiable. If we switch to an L2 norm, we basically have a least squares fit with additional constraints. Of course this is somewhat unsound as we start solving a different, if closely related, problem. I suspect the solution would be "good enough"; or it could be the starting solution that is then polished to conform with the actual target.

In any case, we can use scipy.optimize.lsq_linear as a quick and easy solver. However, that one does not support linearity constraints. We can mimic that with a regularization parameter.

def loss_lsqr(coeffs):
    return 0.5 * linalg.norm(a14w @ coeffs - atw)**2

grad1 = a14w.T @ a14w
grad2 = a14w.T @ at
def jac_lsqr(coeffs):
    return grad1 @ coeffs - grad2

regularization = loss_lsqr(initial_guess) # TODO: Pick better value
a14w_regular = np.append(a14w, np.full((1, 4), regularization), axis=0)
atw_regular = np.append(atw, regularization)
approx = optimize.lsq_linear(a14w_regular, atw_regular, bounds=(0., 1.))

# polish further, maybe?
bounds = ((0., 1.),) * 4
constraint = optimize.LinearConstraint(np.ones(4), 1., 1.)
second_guess = approx.x
second_guess /= linalg.norm(second_guess) # enforce constraint
exact = optimize.minimize(
        loss_lsqr, second_guess, jac=jac_lsqr,
        bounds=bounds, constraints=constraint)

This is incredibly interesting and I'm going to spend my time now trying to understand what you've written. Thanks so much for putting your time and effort into it. — milandeleev, Nov 30 '22 at 10:18
Since writing this question I've found an analytical way to solve this problem using cvxpy: essentially, performing the minimization of ```((at - (a1a2a3a4 @ c1c2c3c4))**2 @ xpoints)```. This is much slower than scipy.optimize.minimize, but gives _the_ answer. — milandeleev, Nov 30 '22 at 10:27
@milandeleev That is not solving it analytically at all, but rather numerically. Note that the solutions will be different from an L1 objective. — Erwin Kalvelagen, Nov 30 '22 at 19:40
@milandeleev Are you talking about this approach? http://ash-aldujaili.github.io/blog/2016/09/01/lp-norm-min/ Sounds reasonable. If you could put your own answer with your code, I would be interested to see the implementation — Homer512, Nov 30 '22 at 19:56

Fastest possible optimisation of an area difference with a constrained sum

1 Answers1

Variables and test data

Target

Loss function

Approximate solution