Why is the nls function returning such different values, for the same model, with similar datasets?

Question

I have two sets of age and length data for the same fish species, both provided in the following link.

And I would like to a fit growth model, using R, that allows for a change in the growth at a specific moment of the lifespan.

I tried using the nls function and provided starting values adapted to my data. The model is an adaptation of the Von Bertalanffy growth model that is supposed to return values for five different parameters (Linf, k0, t0, k1, and t1).

The code I used, for both datasets, was the folowwing:

fit <-as.formula(TL~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1))

model<-nls(fit, data=dataset, start=list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), nls.control(maxiter = 500, tol = 1e-03, minFactor = 1/1024, printEval = FALSE, warnOnly = FALSE))
summary(model)

For the first dataset the values returned were the following:

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
Linf  4.089e+02  1.565e+04   0.026   0.9792    
K0    5.477e-03  2.141e-01   0.026   0.9796    
t0   -2.934e+00  1.500e+00  -1.956   0.0511 .  
K1    7.596e-04  3.004e-02   0.025   0.9798    
t1    2.246e+00  2.143e-01  10.477   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.881 on 457 degrees of freedom

Number of iterations to convergence: 294 
Achieved convergence tolerance: 0.000979

While for the second dataset, the values returned were:

Formula: TL ~ Linf * (1 - exp(-K0 * (Age - t0))) * (Age < t1) + Linf * 
    (1 - exp(-K0 * (t1 - t0) - K1 * (Age - t1))) * (Age > t1)

Parameters:
     Estimate Std. Error t value Pr(>|t|)    
Linf 15.04002    0.60919  24.689  < 2e-16 ***
K0    0.16740    0.01895   8.833  < 2e-16 ***
t0   -3.67353    0.34427 -10.671  < 2e-16 ***
K1    0.11986    0.02007   5.971 2.63e-09 ***
t1    2.29970    0.31711   7.252 5.18e-13 ***
---

Only the values returned for the second dataset make sense for the species in question.

Why is the nls function returning such different parameter values, while using the same model, same starting values and very similar datasets?

Try using the coefficients from the result of the data set that worked as starting values for the other data set. — G. Grothendieck, Mar 09 '23 at 14:20
@G. Grothendieck Even using the results from the dataset that worked the values returned are very similar to the previous ones — Ines Silva, Mar 09 '23 at 14:26
Suggest producing some graphs with the fit superimposed to see if it makes sense. Maybe it just is different than you think? — G. Grothendieck, Mar 09 '23 at 14:30
@G.Grothendieck I think you're spot on with your second suggestion. — Allan Cameron, Mar 09 '23 at 14:41

Allan Cameron · Answer 1 · 2023-03-09T14:40:28.980

I don't think there's anything wrong with the fits per se - they both look like reasonable fits to the given data. The problem appears to be that in the first set there is an apparent change in gradient that occurs around an age where there are relatively few data points.

Here's the plot for the first data set:

library(ggplot2)

fit <-as.formula(y~ Linf * (1 - exp(-K0 * (x - t0))) * (x < t1) +
                   Linf * (1 - exp(-K0 * (t1 - t0) - K1 * (x - t1))) * (x > t1))

ggplot(dataset, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

But the data, and the shape of the plot, is quite different for the second data set:

ggplot(dataset2, aes(Age, TL)) +
  geom_point() +
  geom_smooth(method = nls, formula = fit, method.args = list(
    start = list(Linf=17, K0=0.3, t0=-2, K1=0.1, t1=3), 
    control = list(maxiter = 10000, minFactor = 1e-9, tol = 1e-3)),
    se = FALSE, linetype = 2
  )

So the problem simply lies in your assumption that both data sets are similar. They are not very similar at all, at least in terms of fitting this model. For example, the first data set only has 52 individuals (11%) under the age of 4, but the second data set has 1279 (42%). There is clearly a big difference in the age distribution of the two samples. Note that combining the two data frames using rbind gives one big model that is similar to the values obtained for dataset2 alone.

Ain't it amazing what a little time graphing raw data can do? :-) — Carl Witthoft, Mar 09 '23 at 15:45

Why is the nls function returning such different values, for the same model, with similar datasets?

1 Answers1