Why does gnls (from nlme package for R) estimate a curve that doesn't necessarily fit my data?

Question

I'm trying to fit nonlinear curves while specifying a covariance structure based on a phylogeny. To do this, I am using gnls from nlme to fit my model, while the covariance structure is created using corBrownian from ape. The analysis runs fine, but I am finding that gnls doesn't produce a curve with parameters that fit my data well.

To reproduce what is happening, I've created a bogus dataset with a seemingly nonlinear relationship between the x and y variables.

> bogus<-read.table('bogus.txt',header=T)
> library(nlme)
> library(ape)
> bogus
       x         y
t1  11.55906 2.8300968
t2  11.69827 2.3313596
t3  11.76451 2.9851054
t4  12.08264 2.5448035
t5  12.12407 2.4908419
t6  12.25148 2.1676187
t7  12.53406 2.7367371
t8  12.56658 2.9068268
t9  12.63566 2.7088351
t10 12.74374 2.1415571
t11 12.75834 2.3817043
t12 12.82976 2.7454244
t13 12.89204 2.1909294
t14 12.90101 3.2203981
t15 13.14088 2.6648673
t16 13.26564 2.4982850
t17 13.35068 3.1950533
t18 13.46560 3.7170424
t19 14.27643 2.4693087
t20 14.50607 2.5048268
t21 14.77404 3.3430531
t22 15.02226 1.9006652
t23 15.58785 3.0900789
t24 16.54972 2.5029559
t25 16.62484 2.1646191
t26 17.12499 2.6146433
t27 17.82371 3.2210873
t28 18.80876 2.5716595
t29 19.82221 2.7564073
t30 19.97193 2.7730234
t31 20.02177 2.9545869
t32 20.72805 2.9256731
t33 21.23050 2.0299999
t34 21.55609 0.0000000
t35 21.60317 1.9895633
t36 21.62067 1.3089420
t37 21.73584 1.2215530
t38 21.84590 1.1517204
t39 22.04924 0.8980439
t40 22.21454 1.5719580
t41 22.21508 1.9236154
t42 22.25437 2.0794511
t43 22.59725 0.9415377
t44 23.00222 0.5226221
t45 24.15081 0.5897821
t46 24.78190 0.0000000
t47 24.93000 0.5375178
t48 25.16983 0.9432479
t49 25.70231 0.3888160
t50 26.06668 0.0000000
t51 26.52506 0.8258348
t52 26.62164 0.0000000
t53 28.97916 0.0000000
t54 29.29434 0.6803803
t55 29.58485 0.2704070
t56 29.67962 0.0000000
t57 29.85867 0.5700013
t58 30.33632 0.0000000
t59 30.37679 0.4963074
t60 31.17407 0.0000000

Since this is made up data, I don't have a phylogeny for it, so I simulate it using rtree from ape (since the tree will be randomly generated, there will be slight discrepancies, but the overall result shouldn't be too different). Using this tree, I create my covariance structure and run my analysis using the model I specified (called 'mod', here).

> tree<-rtree(60)
> bogus_cov<-corBrownian(1,tree)
> mod<-y~A-(A/(1+(x/K)^S))
> bogus_fit<-gnls(mod,start=c(A=3,K=22,S=35),correlation=bogus_cov,data=bogus)

The results from this analysis:

> summary(bogus_fit)
Generalized nonlinear least squares fit
  Model: mod 
  Data: bogus 
       AIC      BIC    logLik
  131.5422 139.9196 -61.77112

Correlation Structure: corBrownian
 Formula: ~1 
 Parameter estimate(s):
numeric(0)

Coefficients:
     Value Std.Error   t-value p-value
A -2.25642  0.178272 -12.65720  0.0000
K 21.79689  0.193781 112.48183  0.0000
S 33.94954 12.613109   2.69161  0.0093

 Correlation: 
  A      K     
K -0.262       
S  0.450 -0.041

Standardized residuals:
      Min        Q1       Med        Q3       Max 
0.7277937 1.8177889 2.1017363 2.3306128 2.9470426 

Residual standard error: 1.261279 
Degrees of freedom: 60 total; 57 residual

Just at first glance, we can see that the asymptote (A) is placed at -2, which is well below any y value that is in my bogus dataset. Furthermore, based on how my data are arranged, the S parameter should be negative. This is more evident when the data are plotted with the curve with the parameter estimates:

> plot(bogus,ylim=c(-5,5))
> lines(bogus$x,predict(bogus_fit),lwd=1.5,col='red')

I guess my question is, why is gnls not fitting the curve to my data? Isn't the whole point of least squares to minimize SSE? Is this a matter of a bad model? Is there something I'm missing or misunderstanding about nls/gnls in general? Is there a way to fix this so that the curve does fit the data better?

If you know that `S` should be negative, why do you use a positive starting value? If there are local minima of the SSE you can get non-optimal parameter estimates. Other problems are possible too. — Roland, Jul 06 '15 at 15:55
E.g., try `bogus_fit <- gnls(mod,start=c(A=3,K=22,S=-10),correlation=bogus_cov,data=bogus)`. — Roland, Jul 06 '15 at 16:06
@Roland Hmm...your solution worked for the sample dataset I posted, but it doesn't seem to help with my real dataset...What are some of these other problems that you hint to in your first comment? — g.m, Jul 06 '15 at 20:03
It's all about the shape of the SSE hyperplane in the parameter space. You can have local minima, valleys, regions which very small gradients, ... Maybe you should try first to get a good fit without the correlation structure using different optimizers and than you could use parameter estimates from that as starting values for `gnls`. — Roland, Jul 07 '15 at 09:11
So if I run the analysis without a correlation structure, I get a fit that seems reasonable. Changing the optimizer from `nlminb` to `optim`, nets me the same exact reasonable estimates, as does going further and changing the `optimMethod` from `'BFGS'` to `'L-BFGS-B'`. However, using those parameter estimates with a correlation structure still gives me bad estimates (in fact, the same exact bad estimates, despite using the different starting values). — g.m, Jul 07 '15 at 14:56
So, either the estimates are not bad considering your correlation structure, or your correlation structure is not reasonable, or there are numeric problems. — Roland, Jul 07 '15 at 15:31
Thanks for your help @Roland. After some discussion with a colleague, I've come to the conclusion that, as you suggested, the correlation structure I specified is unreasonable. That and your earlier suggestion about the SSE hyperplane was very informative. I appreciate it. — g.m, Jul 08 '15 at 18:01

Why does gnls (from nlme package for R) estimate a curve that doesn't necessarily fit my data?

0 Answers0