0

I am fairly new to R and I am trying to fit a curve using the nls function. I will first generate a curve y using the dgamma function, which I then want to fit using nls. Here is my toy example.

´´´
x <- 1:250
y <- dgamma(x,2,0.02)
df <- data.frame(x=x,y=y)

nls(y~ dgamma(x,a,b),data=df,start =  list(a =2,b =0.4))
´´´

The error that I am getting is

Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: Warning message: In dgamma(x, a, b) : NaNs produced

What am I doing wrong here?

Thanks

RSale
  • 463
  • 5
  • 14

1 Answers1

1

1) nls This is not a syntax error. The algorithm is not converging. The problems are:

  • dgamma is producing large and/or small numbers causing numeric instabilities

  • nls tends to have problems with zero residual data, i.e. exact fits. if you are using R 4.1, currently the development version of R, then adding the nls argument control = nls.control(scaleOffset = 1) may help to avoid such problems ( see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/nls.control.html ). Below we managed to get convergence without scaleOffset so it will work on the current version of R.

Use a spline fit to create more points, 1000 say, and then fit log(y) rather than y. Use the result of that to fit the original equation continuing to use the additional points.

x <- 1:250
y <- dgamma(x, 2, 0.02)

xx <- seq(1, 250, length = 1000)
spl <- spline(x, y, xout = xx)

fo.log <- log(y) ~ dgamma(x, a, b, log = TRUE)
fm.log <- nls(fo.log, data = spl, start = list(a = 2, b = 0.4))

fo <- y ~ dgamma(x, a, b)
fm <- nls(fo, spl, start = coef(fm.log))
fm

giving:

Nonlinear regression model
  model: y ~ dgamma(x, a, b)
   data: spl
   a    b 
2.00 0.02 
 residual sum-of-squares: 4.399e-19

Number of iterations to convergence: 1 
Achieved convergence tolerance: 7.806e-08

2) optim optim often works well on nonlinear least squares problems. With it we can get a fit directly without the workarounds above. The warnings that are produced can be ignored as it reached convergence (convergence = 0 in the output).

rss <- function(p, x, y) sum((y - dgamma(x, p[["a"]], p[["b"]]))^2)
optim(c(a = 2, b = 0.4), rss, x = x, y = y)

giving:

$par
        a         b 
1.9974423 0.0199842 

$value
[1] 5.209388e-09

$counts
function gradient 
      61       NA 

$convergence
[1] 0

$message
NULL
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you very much for your help and your suggestion. I am using an older R version and will therefore continue with the optim function. – RSale Dec 13 '20 at 19:05
  • I think you have misread the answer. Both (1) and (2) should work without the development version of R. – G. Grothendieck Dec 13 '20 at 21:00