1

I am interested in modelling data with a loss function that is piecewise linear.

For anyone who is unfamiliar, a deadzone linear loss function is like a linear L1 loss that begins at certain points. For example, if I have points lowerBound and upperBound, and if my function output, f(x) falls between these values, it has an error of zero, and outside of this range, the error propogates out linearly (f(x) - upperBound or lowerBound - f(x)).

This is very close to my mental loss function--specifically there's a couple of boundaries (associated with real boundaries in my equipment), outside of which I begin to care about linearly. Inside of them, I do not really care.

My data is produced in real time for a few hundred outputs from the apparatus every point in time, and I require a fast computation of this estimator. For some X * beta = Y, my Y is usually a few hundred, and my beta is dozen or two (depending on the experiment). Speed is a very critical consideration. Accordingly, I model this via least squares due to its closed form estimator of (X.T * X)^-1 * X.T * Y. However, I feel like solutions sometimes fail to align with my desired output (and is much closer to this deadzone linear loss-based output).

Is there a fast algorithm or computation trick to get nearer to this optimum? The best solution I can imagine is a linear program, however, I do not have a lot of experience with their use and comparative speed. Any tricks, guidance, possible approaches, or approximations would be greatly appreciated. Thanks!

Edit: Each set of observations associated with a point in time (Xs and Ys at time t) are distinct from previous runs (times other than t). I only mentioned this to emphasize that I need to run the algo many times.

Michael Clinton
  • 635
  • 1
  • 6
  • 12

1 Answers1

1

First of all, consider the derivative of abs(x - a) + abs(x - b). For x < min(a, b) it is -2, for x > max(a, b) it is +2, and for a <= x <= b it is zero - so your piecewise linear penalty is the sum of two absolute functions (or two L1 norms).

Thus it might be worth looking at the considerable body of work devoted to minimising the L1 norm of deviation - for example, https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares. In fact, I think you can take the algorithm for interatively reweighted least squares minimisation of L1 norms and modify it to handle your dead zone function without explicitly converting each dead zone function into two absolute functions.

mcdowella
  • 19,301
  • 2
  • 19
  • 25