I am interested in modelling data with a loss function that is piecewise linear.
For anyone who is unfamiliar, a deadzone linear loss function is like a linear L1 loss that begins at certain points. For example, if I have points lowerBound and upperBound, and if my function output, f(x) falls between these values, it has an error of zero, and outside of this range, the error propogates out linearly (f(x) - upperBound or lowerBound - f(x)).
This is very close to my mental loss function--specifically there's a couple of boundaries (associated with real boundaries in my equipment), outside of which I begin to care about linearly. Inside of them, I do not really care.
My data is produced in real time for a few hundred outputs from the apparatus every point in time, and I require a fast computation of this estimator. For some X * beta = Y, my Y is usually a few hundred, and my beta is dozen or two (depending on the experiment). Speed is a very critical consideration. Accordingly, I model this via least squares due to its closed form estimator of (X.T * X)^-1 * X.T * Y. However, I feel like solutions sometimes fail to align with my desired output (and is much closer to this deadzone linear loss-based output).
Is there a fast algorithm or computation trick to get nearer to this optimum? The best solution I can imagine is a linear program, however, I do not have a lot of experience with their use and comparative speed. Any tricks, guidance, possible approaches, or approximations would be greatly appreciated. Thanks!
Edit: Each set of observations associated with a point in time (Xs and Ys at time t) are distinct from previous runs (times other than t). I only mentioned this to emphasize that I need to run the algo many times.