9

I am trying to fit a negative exponential to some data in R, but the fitted line looks too high compared to the data, whereas the fit I get using Excel's built-in power fit looks more believable. Can someone tell me why? I've tried using the nls() function and also optim() and get similar parameters from both of those methods, but the fits for both look high.

   x    <- c(5.96, 12.86, 8.40, 2.03, 12.84, 21.44, 21.45, 19.97, 8.92, 25.00, 19.90, 20.00, 20.70, 16.68, 14.90, 26.00, 22.00, 22.00, 10.00, 5.70, 5.40, 3.20, 7.60, 0.59, 0.14, 0.85, 9.20, 0.79, 1.40, 2.68, 1.91)
   y    <- c(5.35, 2.38, 1.77, 1.87, 1.47, 3.27, 2.01, 0.52, 2.72, 0.85, 1.60, 1.37, 1.48, 0.39, 2.39, 1.83, 0.71, 1.24, 3.14, 2.16, 2.22, 11.50, 8.32, 38.98, 16.78, 32.66, 3.89, 1.89, 8.71, 9.74, 23.14)

    xy.frame <- data.frame(x,y)

    nl.fit <- nls(formula=(y ~ a * x^b), data=xy.frame, start = c(a=10, b=-0.7))

    a.est <- coef(nl.fit)[1]
    b.est <- coef(nl.fit)[2]

    plot(x=xy.frame$x,y=xy.frame$y)

    # curve looks too high
    curve(a.est * x^b.est , add=T)
    # these parameters from Excel seem to fit better
    curve(10.495 * x^-0.655, add=T)

enter image description here

    # alternatively use optim()
    theta.init <- c(1000,-0.5, 50)

    exp.nll <- function(theta, data){
      a <- theta[1]
      b <- theta[2]
      sigma <- theta[3]
      obs.y <- data$y
      x <- data$x
      pred.y <- a*x^b
      nll <- -sum(dnorm(x=obs.y, mean=pred.y , sd=sigma, log=T))
      nll
    }

    fit.optim <- optim(par=theta.init,fn=exp.nll,method="BFGS",data=xy.frame )

    plot(x=xy.frame$x,y=xy.frame$y)

    # still looks too high
    curve(a.est * x^b.est, add=T)

enter image description here

josliber
  • 43,891
  • 12
  • 98
  • 133
ginko
  • 117
  • 1
  • 9

1 Answers1

11

The reason you're seeing the unexpected behavior is that the curves that look "too high" actually have much lower sums of squared errors than the curves from excel:

# Fit from nls
sum((y - a.est*x^b.est)^2) 
# [1] 1588.313

# Fit from excel
sum((y - 10.495*x^ -0.655)^2)
# [1] 1981.561

The reason nls favors the higher curve is that it is working to avoid huge errors at small x values at the cost of slightly larger errors with large x values. One way to address this might be to apply a log-log transformation:

mod <- lm(log(y)~log(x))
(a.est2 <- exp(coef(mod)["(Intercept)"]))
# (Intercept) 
#    10.45614 
(b.est2 <- coef(mod)["log(x)"])
#     log(x) 
# -0.6529741 

These are quite close to the coefficients from excel, and yield a more visually appealing fit (despite the worse performance on the sum-of-squared-errors metric):

enter image description here

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Just out of curiousity, if Excel isn't trying to minimize the SSE, what criterion is it using? – eipi10 Oct 21 '15 at 02:07
  • @eipi10 Though I'm not positive, [it looks like](http://www.real-statistics.com/regression/power-regression/) it is using a log-log transformation, as well. Therefore, it's minimizing the SSE when predicting `log(y)` instead of minimizing the SSE when predicting `y`. – josliber Oct 21 '15 at 02:16
  • I also had issues with the negative exponential having different outputs in R and in Excel -- using the log-log transformation in R yielded the same results as the Excel. Another case where Excel does some hidden arithmetic to make them "look" nicer without informing the user! – user3386170 Jul 25 '22 at 14:39