R nls() Initial Parameter Problem, nonlinear Regression

Question

I get a error message:

Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

when using the nls() function like

form_Q10_parabolic_SM <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef<- 5.5354)*a*exp(b*Mean_Soil_Temp_V2..C.)*((-c*Soil_Moist_V3**2)+(d*Soil_Moist_V3)+e))
Q10_parabolic_SM <- nls(form_Q10_parabolic_SM, data = conB1_2015, start = list(a = 1, b = 0.11, c = 0.0001, d = 0.01, e = 0.1))

I got my initial parameters by using the preview() function of the nsltools library like this (same definition of the formula like above)

preview(form_Q10_parabolic_SM, data = conB1_2015, start = c(a = 1, b = 0.11, c = 0.0001, d = 0.01, e = 0.1), variable = 1)

Which gives me this output with the parameters a-e above:

This looks quite good by my eyes and I really don't know what to do at this point since the preview() works just fine.

Is my model too complex or overparameterized? Or did I just do something wrong with the nls function?

Any tips would be really appreciated!

> dput(head(conB1_2015, 30))
structure(list(X = c(13L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 
88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L), IV_Date = c("2015-01-14", 
"2015-03-11", "2015-03-12", "2015-03-13", "2015-03-14", "2015-03-15", 
"2015-03-16", "2015-03-17", "2015-03-18", "2015-03-19", "2015-03-20", 
"2015-03-21", "2015-03-22", "2015-03-23", "2015-03-24", "2015-03-25", 
"2015-03-26", "2015-03-27", "2015-03-28", "2015-03-29", "2015-03-30", 
"2015-03-31", "2015-04-01", "2015-04-02", "2015-04-03", "2015-04-04", 
"2015-04-05", "2015-04-06", "2015-04-07", "2015-04-08"), SMmean010.... = c(24.5341666666667, 
23.4754166666667, 23.0585416666667, 22.830625, 22.7447916666667, 
22.7729166666666, 22.7929166666667, 22.7354166666667, 22.6579166666667, 
22.5935416666667, 22.5233333333333, 22.7641666666667, 23.6010416666667, 
23.445625, 23.404375, 23.2845833333333, 23.0672916666667, 22.9347916666667, 
22.8272916666667, 23.0316666666667, 23.988125, 25.5647916666667, 
27.055, 27.7995833333333, 26.23125, 25.4658333333333, 25.0845833333333, 
24.8175, 24.605, 24.4216666666667), Lin_Flux..mymol.m.2.s.1. = c(1.13, 
2.146, 1.98708333333333, 1.88416666666667, 1.57083333333333, 
1.93041666666667, 2.69875, 2.8075, 3.23272727272727, 2.35818181818182, 
2.23833333333333, 1.84958333333333, 2.18695652173913, 2.16958333333333, 
2.69791666666667, 3.025, 1.985, 1.88083333333333, 2.30416666666667, 
2.775, 1.44458333333333, 1.78791666666667, 1.04863636363636, 
1.03458333333333, 1.4725, 1.86833333333333, 1.71125, 1.79, 1.53166666666667, 
1.97666666666667), Mean_Soil_Temp_V2..C. = c(4.739, 5.1864, 4.08408333333333, 
3.61625, 3.68508333333333, 4.09925, 4.87079166666667, 5.64720833333333, 
6.58433333333333, 5.05075, 4.93708333333333, 4.109, 3.2295, 3.537, 
5.1395, 5.65270833333333, 5.931875, 5.61775, 5.88695833333333, 
6.86308333333333, 5.61833333333333, 4.24566666666667, 3.05952173913043, 
2.45716666666667, 3.6365, 3.68820833333333, 3.83766666666667, 
4.3435, 4.8745, 6.29133333333333), Soil_Moist_V3 = c(25.603137, 
21.98744709, 21.8053864833333, 21.6770563291667, 20.1319423708333, 
19.9826592666667, 19.8279438958333, 20.1589541791667, 21.5796382, 
21.5971315083333, 21.3742824541667, 21.8992939333333, 23.9737254583333, 
23.4506886041667, 23.0956395708333, 22.574581225, 22.3561680833333, 
21.3806269916667, 21.4045219791667, 21.5611478916667, 25.5090813166667, 
28.6440265, 31.4434210347826, 31.9276734541667, 27.5706909333333, 
25.1139413583333, 24.2945348333333, 24.0232171416667, 23.705631425, 
22.8323341625), precip50..mm. = c(0.6, 0, 0, 0, 0.9, 1.3, 0, 
0, 0, 0, 0, 6.6, 0, 0, 0, 0, 0.1, 0.2, 0.1, 6.1, 5, 17.6, 10.4, 
6.6, 0, 0, 0, 0, 0, 0), RWI = c(0.6, 0.4, 0.2, 0.133333333333333, 
0.9, 1.3, 1.3, 0.65, 0.433333333333333, 0.325, 0.26, 6.6, 6.6, 
3.3, 2.2, 1.65, 0.1, 0.2, 0.1, 6.1, 5, 17.6, 10.4, 6.6, 6.6, 
3.3, 2.2, 1.65, 1.32, 1.1)), na.action = structure(c(`1` = 1L, 
`2` = 2L, `3` = 3L, `4` = 4L, `5` = 5L, `6` = 6L, `7` = 7L, `8` = 8L, 
`9` = 9L, `10` = 10L, `11` = 11L, `12` = 12L, `13` = 13L, `15` = 15L, 
`16` = 16L, `17` = 17L, `18` = 18L, `19` = 19L, `20` = 20L, `21` = 21L, 
`22` = 22L, `23` = 23L, `24` = 24L, `25` = 25L, `26` = 26L, `27` = 27L, 
`28` = 28L, `29` = 29L, `30` = 30L, `31` = 31L, `32` = 32L, `33` = 33L, 
`34` = 34L, `35` = 35L, `36` = 36L, `37` = 37L, `38` = 38L, `39` = 39L, 
`40` = 40L, `41` = 41L, `42` = 42L, `43` = 43L, `44` = 44L, `45` = 45L, 
`46` = 46L, `47` = 47L, `48` = 48L, `49` = 49L, `50` = 50L, `51` = 51L, 
`52` = 52L, `53` = 53L, `54` = 54L, `55` = 55L, `56` = 56L, `57` = 57L, 
`58` = 58L, `59` = 59L, `60` = 60L, `61` = 61L, `62` = 62L, `63` = 63L, 
`64` = 64L, `65` = 65L, `66` = 66L, `67` = 67L, `68` = 68L, `199` = 199L, 
`218` = 218L, `219` = 219L, `220` = 220L, `221` = 221L, `222` = 222L, 
`223` = 223L, `224` = 224L, `225` = 225L, `226` = 226L, `227` = 227L, 
`228` = 228L, `229` = 229L, `230` = 230L, `231` = 231L, `232` = 232L, 
`264` = 264L, `265` = 265L, `266` = 266L, `267` = 267L, `352` = 352L, 
`353` = 353L, `354` = 354L, `355` = 355L, `356` = 356L, `357` = 357L, 
`358` = 358L, `359` = 359L, `360` = 360L, `361` = 361L, `362` = 362L, 
`363` = 363L, `364` = 364L, `365` = 365L, `366` = 366L), class = "omit"), row.names = c(14L, 
69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 
82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 
95L, 96L, 97L), class = "data.frame")

Can you post sample data? Please edit **the question** with the output of `dput(conB1_2015)`. Or, if it is too big with the output of `dput(head(conB1_2015, 30))`. — Rui Barradas, May 31 '20 at 20:15

G. Grothendieck · Answer 1 · 2020-06-01T13:45:17.830

The main problem is that the parameters are not uniquely identifiable. We can multiply a by an arbitrary number and divide c, d and e by that same number and we get the same model. Omit a.
Although it won't hurt the use of as.formula is redundant since it is already a formula.
Having an assignment within an nls formula is highly unusual. nls will think that Rref is a parameter and fail on that account. Remove the assignment.

If we make these changes then it does give an answer with the data in the updated version of the question.

form_Q10_parabolic_SM <- Lin_Flux..mymol.m.2.s.1. ~ 
 exp(b*Mean_Soil_Temp_V2..C.) * ( (-c*Soil_Moist_V3**2) + (d*Soil_Moist_V3) + e)

Q10_parabolic_SM <- nls(form_Q10_parabolic_SM, data = conB1_2015, 
  start = list(b = 0.11, c = 0.0001, d = 0.01, e = 0.1))

giving:

> Q10_parabolic_SM
Nonlinear regression model
  model: Lin_Flux..mymol.m.2.s.1. ~ exp(b * Mean_Soil_Temp_V2..C.) * ((-c *     Soil_Moist_V3^2) + (d * Soil_Moist_V3) + e)
   data: conB1_2015
        b         c         d         e 
 0.103062 -0.001564 -0.135531  3.528621 
 residual sum-of-squares: 3.979

Number of iterations to convergence: 6 
Achieved convergence tolerance: 4.401e-06

plinear

Note that nls also has the plinear algorithm which has the advantage that only nonlinear parameters (in this case only b) need starting values. In that case the formula's RHS should be a matrix with the columns that multiply each linear parameter. It gives the same answer as above except the linear parameters are given names starting with .lin . Note that the plinear version converges in fewer iterations than the version using the default algorithm above. (Also it seems that the plinear version is not very sensitive to the starting value and even if we use b=1 as the starting value it converges.)

fo <- Lin_Flux..mymol.m.2.s.1. ~ 
  cbind(-Soil_Moist_V3**2, Soil_Moist_V3, 1) * exp(b*Mean_Soil_Temp_V2..C.)
fm <- nls(fo, data = conB1_2015, start = list(b = 0.11), algorithm = "plinear")

giving:

> fm
Nonlinear regression model
  model: Lin_Flux..mymol.m.2.s.1. ~ cbind(-Soil_Moist_V3^2, Soil_Moist_V3,     1) * exp(b * Mean_Soil_Temp_V2..C.)
   data: conB1_2015
                 b              .lin1 .lin.Soil_Moist_V3              .lin3 
          0.103062          -0.001564          -0.135528           3.528593 
 residual sum-of-squares: 3.979

Number of iterations to convergence: 3 
Achieved convergence tolerance: 2.189e-06

I omited a, but still get the same error message. I need the assignments since I want to automate the regression over several csv's with similiar data and these values will change slighlty with each table. But the nls function works with these asssignments on simpler regressions I did, so that can't be the problem I guess. — Moe, Jun 01 '20 at 10:13
I can't reproduce the error. Now that you have provided data I ran it with that data and the 3 recommended changes and it gives the answer shown. — G. Grothendieck, Jun 01 '20 at 10:42
Thank you for your answer, I removed the rRef variable and it does work like that. I'm still wondering why that variable is such a problem since the regression with Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.) works without a problem. I may try the plinear approach, thanks for that info! — Moe, Jun 01 '20 at 13:39
It likely extracts the variable names from the formula and expects that each one is either data or a parameter but rRef is neither. — G. Grothendieck, Jun 01 '20 at 13:49

R nls() Initial Parameter Problem, nonlinear Regression

1 Answers1

plinear