2

I have four arrays, a1,a2,a3,a4, each of length 500. I have a target array at, also of length 500. These arrays each represent the y coordinates of unevenly spaced points on a graph. I have the x coordinates in a separate array.

I want to optimise the coefficients c1,c2,c3,c4 such that the area difference between c1x1 + c2x2 + c3x3 + c4x4 and at is minimised. The coefficients must sum to 1 and be between 0 and 1 inclusive.

I currently do this using scipy.optimize.minimize. Using scipy.optimize.LinearConstraints, I constrain my target variable (an array [c1,c2,c3,c4]) to sum to 1. The loss function multiplies the target variable by the arrays a1,a2,a3,a4, and then finds the absolute difference between this sum and at. This absolute difference is integrated between each consecutive pair of x coordinates, and the sum of these integrals is outputted.

Here is the code:

xpoint_diffs = xpoints[1:] - xpoints[:-1]
a1a2a3a4 = np.array([a1,a2,a3,a4])

def lossfunc(x):
   abs_diff = abs(at - (a1a2a3a4 * x).sum(axis = 1))
   return ((absdiff[:-1] + absdiff[1:])/2 * xpoint_diffs).sum()

Is there a faster way to do this?

I then pass my constraint and loss function into scipy.optimize.minimize.

  • Just a small optimization: You can pull the division by 2 in the last line into the ```xpoint_diffs```. Then you can rewrite it into into a single ```np.dot``` which should be faster: ```return np.dot(absdiff[:-1] + absdiff[1:], half_xpoint_diffs)``` – Homer512 Nov 29 '22 at 12:38
  • The ```(a1a2a3a4 * x).sum(axis = 1)``` can also be optimized into a single matrix vector product. However, I'm currently a bit confused about the shapes. ```a1a2a3a4``` should be shape ```(4, 500)``` and ```x``` shape ```(4,)```, right? But that does not broadcast. I guess ```x``` is ```(4, 1)```? – Homer512 Nov 29 '22 at 12:55
  • 3
    Using `abs` inside one of your functions makes your optimization problem non-differentiable and thus it violates the mathematical assumptions of most algorithms under the hood of scipy's `minimize`. Just square the expression on the right hand side of abs_diff. That said, the performance of solving a nonlinear optimization problem depends on: a) your initial guess b) your used algorithm c) the implementation of your functions d) how you calculate the derivatives (exact vs approximation by finite differences. Providing a full minimal **reproducible** example would make it easier for us to help. – joni Nov 29 '22 at 12:56
  • @Homer512 that's a good point, thank you. I will turn them both into dot products. Yes, the shape of x is (4,1), you are correct. – milandeleev Nov 29 '22 at 12:57
  • @joni This is essentially my example. I test this using numpy.random.rand matrices of (500,1) to produce ```a1,a2,a3,a4,at```. I don't select an algorithm, and the implementation of my functions is as above. I don't calculate any derivatives and my initial guess is super simple - just (0.25,0.25,0.25,0.25) in this case. – milandeleev Nov 29 '22 at 13:00
  • But if ```a1a2a3a4``` is shape ```(4, 500)``` and ```x``` is shape ```(4, 1)```, then ```(a1a2a3a4 * x).sum(axis=1)``` is shape ```(4, )```. Don't you need shape ```(500, )``` to then integrate the individual line segments? – Homer512 Nov 29 '22 at 13:17
  • 1
    I suspect this is actually an LP (after linearizing the abs(.) function). Obviously, an LP solver should be the fastest for an LP problem. – Erwin Kalvelagen Nov 29 '22 at 15:50
  • @ErwinKalvelagen how would I go about making this problem linear? I know it's convex, but the squared term is throwing me off. – milandeleev Nov 30 '22 at 12:18
  • The loss function is piecewise linear in x (due to the abs(.)) but can be linearized using standard techniques. The result is a linear model (i.e. an LP). I don't see any squared term. I am looking at your lossfunc(x) function. – Erwin Kalvelagen Dec 01 '22 at 01:28

1 Answers1

2

This isn't a full solution, more like some thoughts that may lead to one. Also, someone please double-check my math.

Variables and test data

First, let's begin by defining some test data. While doing so, I transpose the a1a2a3a4 matrix because it will prove more conventient. Plus, I'm renaming it to a14 because I cannot be bothered writing the whole thing all the time, sorry.

import numpy as np
from numpy import linalg, random
from scipy import optimize

a14 = random.uniform(-1., 1., (500, 4)) * (0.25, 0.75, 1., 2.)
at = random.uniform(-1., 1., 500)
xpoints = np.cumsum(random.random(500))
initial_guess = np.full(4, 0.25)

Target

Okay, here is my first slight point of confusion: OP described the target as the "area difference" between two graphs. X coordinates are given by xpoints. Y coordinates of the target are given as at and we search coeffs so that a14 @ coeffs gives Y coordinates close to the target.

Now, when we talk "area under the graph", my first instinct is piecewise linear. Similar to the trapezoidal rule this gives us the formula sum{(y[i+1]+y[i]) * (x[i+1]-x[i])} / 2. We can reformulate that a bit to get something of the form (y[0]*(x[1]-x[0]) + y[1]*(x[2]-x[0]) + y[2]*(x[3]-x[1]) + ... + y[n-1]*(x[n-1]-x[n-2])) / 2. We want to precompute the x differences and scaling factor.

weights = 0.5 * np.concatenate(
         ((xpoints[1] - xpoints[0],),
         xpoints[2:] - xpoints[:-2],
         (xpoints[-1] - xpoints[-2],)))

These factors are different from the ones OP used. I guess they are integrating a piecewise constant function instead of piecewise linear? Anyway, switching between these two should be simple enough so I leave this here as an alternative.

Loss function

Adapting OP's loss function to my interpretation gives

def loss(coeffs):
    absdiff = abs(at - (a14 * coeffs).sum(axis=1))
    return (absdiff * weights).sum()

As I've noted in comments, this can be simplified by using matrix vector multiplications.

def loss(coeffs):
    return abs(a14 @ coeffs - at) @ weights

This is the same as abs((a14 * weights[:, np.newaxis]) @ coeffs - (at * weights)).sum() which in turn makes it obvious that we are talking about minimizing the L1 norm.

a14w = a14 * weights[:, np.newaxis]
atw = at * weights
def loss(coeffs):
    return linalg.norm(a14w @ coeffs - atw, 1)

This isn't a huge improvement as far as computing the loss function goes. However, it goes a long way towards pressing our problem into a regular pattern.

Approximate solution

As noted in comments, the abs() function in the L1 norm is poison for optimization because it is non-differentiable. If we switch to an L2 norm, we basically have a least squares fit with additional constraints. Of course this is somewhat unsound as we start solving a different, if closely related, problem. I suspect the solution would be "good enough"; or it could be the starting solution that is then polished to conform with the actual target.

In any case, we can use scipy.optimize.lsq_linear as a quick and easy solver. However, that one does not support linearity constraints. We can mimic that with a regularization parameter.

def loss_lsqr(coeffs):
    return 0.5 * linalg.norm(a14w @ coeffs - atw)**2

grad1 = a14w.T @ a14w
grad2 = a14w.T @ at
def jac_lsqr(coeffs):
    return grad1 @ coeffs - grad2

regularization = loss_lsqr(initial_guess) # TODO: Pick better value
a14w_regular = np.append(a14w, np.full((1, 4), regularization), axis=0)
atw_regular = np.append(atw, regularization)
approx = optimize.lsq_linear(a14w_regular, atw_regular, bounds=(0., 1.))

# polish further, maybe?
bounds = ((0., 1.),) * 4
constraint = optimize.LinearConstraint(np.ones(4), 1., 1.)
second_guess = approx.x
second_guess /= linalg.norm(second_guess) # enforce constraint
exact = optimize.minimize(
        loss_lsqr, second_guess, jac=jac_lsqr,
        bounds=bounds, constraints=constraint)
Homer512
  • 9,144
  • 2
  • 8
  • 25
  • 1
    This is incredibly interesting and I'm going to spend my time now trying to understand what you've written. Thanks so much for putting your time and effort into it. – milandeleev Nov 30 '22 at 10:18
  • Since writing this question I've found an analytical way to solve this problem using cvxpy: essentially, performing the minimization of ```((at - (a1a2a3a4 @ c1c2c3c4))**2 @ xpoints)```. This is much slower than scipy.optimize.minimize, but gives _the_ answer. – milandeleev Nov 30 '22 at 10:27
  • @milandeleev That is not solving it analytically at all, but rather numerically. Note that the solutions will be different from an L1 objective. – Erwin Kalvelagen Nov 30 '22 at 19:40
  • @milandeleev Are you talking about this approach? http://ash-aldujaili.github.io/blog/2016/09/01/lp-norm-min/ Sounds reasonable. If you could put your own answer with your code, I would be interested to see the implementation – Homer512 Nov 30 '22 at 19:56