2

I have a number of short time-series ( maybe 30 - 100 time points ), and they have a general shape : they start high, come down quickly, may or may not plateau near zero, and then go back up. If they don't plateau, they look something like a simple quadratic, and if they do plateau, you may have a long series of zeros.

I'm trying to use the lmfit module to fit a piecewise linear curve that is continuous. I'd like to infer where the line changes gradients, that is, I want to know where the curve "qualitatitively" changes gradients. I'd like to know when the gradient stops going down, and when it starts increasing again, in general terms. I'm having a few issues with it :

  • lmfit seems to require at least two parameters, so I'm having to pass _.
  • I'm unsure how to constrain one parameter to be greater than another.
  • I'm getting could not broadcast input array from shape (something) into shape (something) errors

Here's some code. First, my objective function, to be minimised.

def piecewiselinear(params, data, _) :

    t1 = params["t1"].value
    t2 = params["t2"].value
    m1 = params["m1"].value
    m2 = params["m2"].value
    m3 = params["m3"].value
    c = params["c"].value

    # Construct continuous, piecewise-linear fit
    model = np.zeros_like(data)
    model[:t1] = c + m1 * np.arange(t1)
    model[t1:t2] = model[t1-1] + m2 * np.arange(t2 - t1)
    model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)

    return model - data

I then call,

p = lmfit.Parameters()
p.add("t1", value = len(data)/4, min = 1, max = len(data))
p.add("t2", value = len(data)/4*3, min = 2, max = len(data))
p.add("m1", value = -100., max=0)
p.add("m2", value = 0.)
p.add("m3", value = 20., min = 1.)
p.add("c", min=0, value = 800.)

result = lmfit.minimize(piecewiselinear, p, args = (data, _) )

The model is that, at some time t1, the gradient of the line changes, and the same happens at t2. Both of these parameters, as well as the gradients of the line segments ( and one intercept ), need to be inferred.

I could do this using MCMC methods, but I have too many of these series, and it would take too long.

Part of the traceback :

     15     model = np.zeros_like(data)
     16     model[:t1] = c + m1 * np.arange(t1)
---> 17     model[t1:t2] = model[t1-1] + m2 * np.arange(t2-t1)
     18     model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
     19 

ValueError: could not broadcast input array from shape (151) into shape (28)

A couple of examples of the time-series : No plateau : steady drop and rise Long, sharp plateau

Any and all suggestions welcome. Thank you very much.

Quentin
  • 338
  • 5
  • 16

2 Answers2

1

Here's a plot from a rather brute-force 3-pwlin fitter; will trade rough code for test cases. enter image description here

Also, a couple of links:
Fit-piecewise-linear-data on dsp.stack might give you some ideas; added a bit on Dynamic programming.
github.com/NickFoubert/simple-segment has python for segmenting e.g. ECGs with max_error (not number of pieces), from a nice paper by Keogh et al., An online algorithm for segmenting time series, 2001, 8p.

And a possible alternative: could you just fit the power p in y ~ x^p, log y ~ p log x^2 (after shifting x to [-1 .. 1] and y > 1e-6 or so) ?
This would be robust, fast, and easy to plot and understand.
One should probably weight the ends so that the errors are roughly flat and normal.
Also one could fit separate p p' to the left and right halves.

Community
  • 1
  • 1
denis
  • 21,378
  • 10
  • 65
  • 88
  • Thanks very much for that link to the DSP post. A few interesting options in there ! Apologies, I realise that my original question wasn't very clear, so I've fixed it. I'm really looking for the knots. Equivalently, by whatever measure ( PWL or not ), I want the two locations where the curves really "change direction" - I think a PWL approximation is the best way to capture that. Sorry for not being clearer before. – Quentin Mar 01 '14 at 19:39
  • Here's a pickle file containing a dict with some data. `x1, x2, x3` are smoothish curves as in my first plot, and represent one extreme of smoothness of the data. `y1, y2, y3` represent the other extreme : very sharp and discontinuous data. Read with `data = pickle.load("so.p")`. www.quentincaudron.com/so.p . Thanks for your perseverance in this. I'm tempted to just go brute force, testing all possible knot locations. Ugly but potentially effective ! – Quentin Mar 03 '14 at 16:55
  • nope, "ValueError: insecure string pickle" ?? np.savetxt a few (or forget it -- diminishing returns) – denis Mar 04 '14 at 11:33
  • Urgh, the pickle file seems to open fine here. Sorry about that. If you're still interested, I have six np.savetxt arrays here : quentincaudron.com/so . Otherwise, no worries. Thanks for the help thus far, it's appreciated. – Quentin Mar 04 '14 at 16:03
0

Going down the brute force route seems to do the trick. I'm just testing all combinations of switchpoints and picking the best fit. It's very quick and can be reasonably robust. Here's the result of one particular fit.

enter image description here

I'm forcing the gradient of the second line to be zero. This ensures that we don't get an OK fit for two lines and a perfect fit for one, which may grab a higher score ( I'm using the sum of R^2 values here ). In green are marked the switchpoints, and these should work very well for my application.

I'd love to learn a more elegant want to do this, but in the meantime, this is an option...

Quentin
  • 338
  • 5
  • 16