1

I have some difficulties getting a specific curve to fit in R, while it works perfectly fine in a commercial curve-fitting program.

The formula that the data should fit to is:

y(t) = A * exp(-a*(t)) + B * exp(-b*(t)) - (A+B) * exp(-c*(t))

So for this I want to use the nonlinear regression built into R. I've been at this for a day on-and-off now and just can't get it to function. The issue lies entirely with the initial values, so I'm using NLS2 to brute-force find the initial values.

y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,0.00355375,
0.00308875,0.0028925,0.00266375)
t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(t,y)
plot(t,y);
#Our model:
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);

#Define the outer boundaries to search for initial values
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));

#Do the brute-force
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "brute-force",
        control=list(maxiter=20000))
fit
coef(fit)
final <- nls(fo, data=df, start=as.list(coef(fit)))

The values it should give are:

f1  0.013866
f2  0.005364
k1  0.063641
k2  0.004297
k3  0.615125

Though even with quite high iteration values, I'm just getting nonsense returns. I'm clearly doing something wrong, but I cannot see it

Edit based on @Roland 's comment:

The method you propose with the approximation of k1-3 with a linear approach seems to work on some datasets, but not on all of them. Below is the code I'm using now based on your input.

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
DF <- data.frame(y, t)
fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
            start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
k1_predict <-summary(fit1)$coefficients[1,1]
k2_predict <-summary(fit1)$coefficients[2,1]
k3_predict <-summary(fit1)$coefficients[3,1]
fo <- y ~ f1*exp(-k1*t)+f2*exp(-k2*t)-(f1+f2)*exp(-k3*t);
fit2 <- nls(fo, data = DF, 
            start = list(k1 = k1_predict, k2 = k2_predict, k3 = k3_predict, f1 = 0.01, f2 = 0.01))
summary(fit2);
plot(t,y);
curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

oral_example fit

@G. Grothendieck Similar to Roland's suggestion, your suggestion is also excellent in that it is capable of fitting some datasets but struggles with others. The code below is based on your input, and exits with a singular gradient matrix.

#Oral example:
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168);
#IV example:
#y <- c(0,0.01377,0.01400875,0.0119175,0.00759375,0.00512125,0.004175,
#0.00355375,0.00308875,0.0028925,0.00266375)
#t <- c(0,3,6,12,24,48,72,96,120,144,168)
df <- data.frame(y, t)
grd <- data.frame(f1=c(0,1),
              f2=c(0,1),
              k1=c(0,2),
              k2=c(0,2),
              k3=c(0,0.7));
set.seed(123)
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 100000))
nls(fo, df, start = coef(fit), alg = "port", lower = 0)
plot(t,y);
curve(predict(nls, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")
redtails
  • 11
  • 1
  • 4

2 Answers2

3

I would first do a partially linear fit with no constraints on the linear parameters to get good starting values for the exponential parameters and some idea regarding the magnitude of the linear parameters:

DF <- data.frame(y, t)
fit1 <- nls(y ~ cbind(exp(-k1*t), exp(-k2*t), exp(-k3*t)), algorithm = "plinear", data = DF,
            start = list(k1 = 0.002, k2 = 0.02, k3= 0.2))
summary(fit1)
#Formula: y ~ cbind(exp(-k1 * t), exp(-k2 * t), exp(-k3 * t))
#
#Parameters:
#        Estimate Std. Error t value Pr(>|t|)    
#k1     0.0043458  0.0010397   4.180 0.008657 ** 
#k2     0.0639379  0.0087141   7.337 0.000738 ***
#k3     0.6077646  0.0632586   9.608 0.000207 ***
#.lin1  0.0053968  0.0006637   8.132 0.000457 ***
#.lin2  0.0139231  0.0008694  16.014 1.73e-05 ***
#.lin3 -0.0193145  0.0010631 -18.168 9.29e-06 ***

Then you can fit your actual model:

fit2 <- nls(fo, data = DF, 
            start = list(k1 = 0.06, k2 = 0.004, k3 = 0.6, f1 = 0.01, f2 = 0.01))
summary(fit2)  
#Formula: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
#
#Parameters:
#    Estimate Std. Error t value Pr(>|t|)    
#k1 0.0639344  0.0079538   8.038 0.000198 ***
#k2 0.0043456  0.0009492   4.578 0.003778 ** 
#k3 0.6078929  0.0575616  10.561 4.24e-05 ***
#f1 0.0139226  0.0007934  17.548 2.20e-06 ***
#f2 0.0053967  0.0006059   8.907 0.000112 ***         

curve(predict(fit2, newdata = data.frame(t = x)), 0, 200, add = TRUE, col = "red")

resulting plot

Note that this model can easily be re-parameterized by switching the exponential terms (i.e., the order of the kn starting values), which could result in different estimates for f1 and f2, but basically the same fit.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • I've made an amendment to the starting post based on your input. Your suggestion seems to partially solve the issue, though on some datasets I still struggle to find a good fit with R. If you can glance at it another time, it could really help my research – redtails Nov 11 '15 at 15:45
  • You have (i) a complex five parameter model and (ii) in the critical time range (around the "peak") very few data points to support this model. I believe that small deviations (e.g., due to measurement uncertainty) of the second data point have a strong impact on the fit and can lead to convergence problems. – Roland Nov 11 '15 at 16:21
  • All right. thank you kindly. your contribution has helped me to better model my IV/oral data :) – redtails Nov 13 '15 at 16:29
2

With this many parameters I would use algorithm = "random" rather than "brute". If we do that then the following gives a result close to the one in the question (up to permutation of the arguments due to the symmetry of the model parameters):

set.seed(123)
fit <- nls2(fo,
        data=df,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 20000))
nls(fo, df, start = coef(fit), alg = "port", lower = 0)

giving:

Nonlinear regression model
  model: y ~ f1 * exp(-k1 * t) + f2 * exp(-k2 * t) - (f1 + f2) * exp(-k3 * t)
   data: df
      f1       f2       k1       k2       k3 
0.005397 0.013923 0.004346 0.063934 0.607893 
 residual sum-of-squares: 2.862e-07

Algorithm "port", convergence message: relative convergence (4)

ADDED

A variation of the above is to use nlsLM in the minpack.lm package instead of nls and to use splines to get more points in the data set. In place of the nls line try the following. It still gives convergence:

library(minpack.lm)
t_s <- with(df, min(t):max(t))
df_s <- setNames(data.frame(spline(df$t, df$y, xout = t_s)), c("t", "y"))
nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))

and it also does in the Oral example:

set.seed(123)
y <- c(0,0.0045375,0.0066325,0.00511375,0.00395875,0.003265,0.00276,
0.002495,0.00231875);
t <- c(0,12,24,48,72,96,120,144,168)
DF <- data.frame(y, t)
grd <- data.frame(f1=c(0,1), f2=c(0,1), k1=c(0,2), k2=c(0,2), k3=c(0,0.7))
fit <- nls2(fo,
        data=DF,
        start = grd,
        algorithm = "random",
        control = nls.control(maxiter = 20000))

library(minpack.lm)
t_s <- with(DF, min(t):max(t))
df_s <- setNames(data.frame(spline(DF$t, DF$y, xout = t_s)), c("t", "y"))
nlsLM(fo, df_s, start = coef(fit), lower = rep(0,5), control = nls.control(maxiter = 1024))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I've made an amendment on the original question, also based on your input. Your suggestion to brute-force with a random seed seems to work very efficiently on some datasets, though on other datasets I struggle to get a fit. If you have additional tips, it'd really help me – redtails Nov 11 '15 at 15:47
  • thank you kindly, your contribution has been helpful for modelling my data :) – redtails Nov 13 '15 at 16:28
  • You are supposed to check off the one you favor, not use the comments for thanks. – G. Grothendieck Nov 13 '15 at 17:08