R: Trouble fitting a 4-parameter hockeystick curve with nls

Question

My dataset:

mydata<-structure(list(t = c(0.208333333, 0.208333333, 0.208333333, 0.208333333, 
1, 1, 1, 1, 2, 2, 2, 2, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 
16, 16, 0.208333333, 0.208333333, 0.208333333, 0.208333333, 1, 
1, 1, 1, 2, 2, 2, 2), parent = c(1.2, 1.4, 0.53, 1.2, 1, 0.72, 
0.93, 1.1, 0.88, 0.38, 0.45, 0.27, 0.057, 0.031, 0.025, 0.051, 
0.027, 0.015, 0.034, 0.019, 0.017, 0.025, 0.024, 0.023, 0.29, 
0.22, 0.34, 0.19, 0.12, 0.092, 0.41, 0.28, 0.064, 0.05, 0.058, 
0.043)), .Names = c("t", "Ct"), row.names = c(325L, 326L, 
327L, 328L, 341L, 342L, 343L, 344L, 357L, 358L, 359L, 360L, 373L, 
374L, 375L, 376L, 389L, 390L, 391L, 392L, 401L, 402L, 403L, 404L, 
805L, 806L, 807L, 808L, 821L, 822L, 823L, 824L, 837L, 838L, 839L, 
840L), class = "data.frame")

The function to be fitted is a hockeystick curve; i.e. it flattens off after the bending point tb:

hockeystick<-function (t, C0, k1, k2, tb) 
{
  Ct = ifelse(t <= tb, C0 -k1 * t, C0 -k1*tb -k2*t)
}

Fitting using nls:

start.hockey<-c(C0=3,k1=1,k2=0.1,tb=3)
nls(log(Ct)~hockeystick(t,C0,k1,k2,tb),start=start.hockey,data=mydata)

No matter what starting values I use, I always get this error:

Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

I tried both the port and the standard nls methods. I tried both the linearized (shown here) and the normal state of the model but neither seems to work.

Edit: As per the suggestion of Carl I tried to fit the model to a dataset where I first averaged the Ct values per value of t and still get the error.

edit: Changed the model somewhat so the k2 value is positive instead of negative. A negative value does not make sense kinetically.

Try plotting `hockeystick(mydata$t,C0,k1,k2,tb)` vs `mydata$t` . It's not a hockey stick. Further, your repeated values of `t` are almost guaranteed to cause regression to fail. — Carl Witthoft, Sep 11 '14 at 13:38
So in general, I'd be better off fitting the regression to averaged values per value of t? The hockeystick is in linearized form, so the y-axis is in logarithmic units. — Pinemangoes, Sep 11 '14 at 13:58
tb is the 'bending point' of the hockeystick model, the point at which the curve changes its 'descent rate'. — Pinemangoes, Sep 11 '14 at 14:06

Ben Bolker · Answer 1 · 2014-09-11T18:44:43.613

2

I haven't quite solved the nls() problem, but I have a few suggestions.

First of all, I would suggest revising your hockey stick function slightly to make it continuous at the breakpoint:

hockeystick<-function (t, C0, k1, k2, tb) 
{
   Ct <- ifelse(t <= tb, C0 -k1 * t, C0 -k1*t -k2*(t-tb))
}

Eyeballing:

par(las=1,bty="l") ## cosmetic
plot(log(Ct)~t,data=mydata)
curve(hockeystick(x,C0=0,k1=0.8,k2=-0.7, tb=3),add=TRUE)

enter image description here

I've made k2 negative here so the decreasing slope in the second stage is less than in the first stage.

start.hockey <- c(C0=0,k1=0.8,k2=-0.7, tb=3)
nls(log(Ct)~hockeystick(t,C0,k1,k2,tb),
                        start=start.hockey,data=mydata)

Models with breakpoints are often non-differentiable in the parameters, but I don't quite see how that's a problem here ...

This does work:

library(bbmle)
m1 <- mle2(log(Ct)~dnorm(hockeystick(t,C0,k1,k2,tb),
                  sd=exp(logsd)),
          start=c(as.list(start.hockey),list(logsd=0)),
          data=mydata)

The parameters are reasonable (and different from the starting values):

coef(summary(m1))
##         Estimate Std. Error   z value        Pr(z)
## C0    -0.4170749  0.2892128 -1.442104 1.492731e-01
## k1     0.6720120  0.2236111  3.005271 2.653439e-03
## k2    -0.5285974  0.2400605 -2.201934 2.766994e-02
## tb     2.0007688  0.1714292 11.671108 1.790751e-31
## logsd -0.2218745  0.1178580 -1.882558 5.976033e-02

Plot predictions:

pframe <- data.frame(t=seq(0,15,length=51))
pframe$pred <- predict(m1,newdata=pframe)
with(pframe,lines(t,pred,col=2))

enter image description here

edited Sep 11 '14 at 18:44

answered Sep 11 '14 at 14:23

Ben Bolker

211,554
25
370
453

Thanks for the comprehensive answer. Yes, your modification from `tb` to `t-tb` is a big improvement. I'll check out the rest of your answer when I get home later tonight. – Pinemangoes Sep 11 '14 at 14:28
Could you please explain why you wrap the `dnorm` function around `hockeystick()` for the estimation? Also, for this particular compound I'll probably need to get data beween the 2-10 range for `t`... – Pinemangoes Sep 11 '14 at 18:30
Also, it seems the `tb` parameter is not changed at all during the fitting. Whatever value I use in hockey.start is preserved in the final parameter set. While I can usually make an educated guess from the data, I can't make that stick in a paper. – Pinemangoes Sep 11 '14 at 18:37
the `tb` parameter did change for the test set (see edits above). If you're using reasonable starting values I don't know why the optimizer would get stuck. I'm using `dnorm()` because `mle2` doesn't do least-squares, it does general maximum likelihood estimation, and `dnorm()` is a way to get it to the equivalent of least-squares fitting. – Ben Bolker Sep 11 '14 at 18:46
it would be more efficient and robust to fit a piecewise linear model using `lm` for a specified breakpoint and then do 1-D nonlinear optimization (`optimize`) to choose the breakpoint. Or use the `strucchange` package. – Ben Bolker Sep 11 '14 at 18:48

R: Trouble fitting a 4-parameter hockeystick curve with nls

1 Answers1