Approach for comparing linear, non-linear and different parameterization non-linear models

Question

I search for one approach for comparing linear, non-linear and different parameterization non-linear models. For this:

#Packages
library(nls2)
library(minpack.lm)

# Data set - Diameter in function of Feature and Age
Feature<-sort(rep(c("A","B"),22))
Age<-c(60,72,88,96,27,
36,48,60,72,88,96,27,36,48,60,72,
88,96,27,36,48,60,27,27,36,48,60,
72,88,96,27,36,48,60,72,88,96,27,
36,48,60,72,88,96)
Diameter<-c(13.9,16.2,
19.1,19.3,4.7,6.7,9.6,11.2,13.1,15.3,
15.4,5.4,7,9.9,11.7,13.4,16.1,16.2,
5.9,8.3,12.3,14.5,2.3,5.2,6.2,8.6,9.3,
11.3,15.1,15.5,5,7,7.9,8.4,10.5,14,14,
4.1,4.9,6,6.7,7.7,8,8.2)
d<-dados <- data.frame(Feature,Age,Diameter)
str(d)

I will create three different models, two non-linear models with specific parametization and one linear model. In my example a suppose that all the coefficients of each mode were significant (and not considering real results).

# Model 1 non-linear
e1<- Diameter ~ a1 * Age^a2 
#Algoritm Levenberg-Marquardt
m1 <-  nlsLM(e1, data = d,
     start = list(a1 = 0.1, a2 = 10),
     control = nls.control(maxiter = 1000))

# Model 2 linear
m2<-lm(Diameter ~ Age, data=d)

# Model 3 another non-linear
e2<- Diameter ~ a1^(-Age/a2)
m3 <-  nls2(e2, data = d, alg = "brute-force",
     start = data.frame(a1 = c(-1, 1), a2 = c(-1, 1)),
     control = nls.control(maxiter = 1000))

Now, my idea is comparing the "better" model despite the different nature of each model, than I try a proportional measure and for this I use each mean square error of each model comparing of total square error in data set, when a make this I have if a comparing model 1 and 2:

## MSE approach (like pseudo R2 approach)

#Model 1
SQEm1<-summary(m1)$sigma^2*summary(m1)$df[2]# mean square error of model 
SQTm1<-var(d$Diameter)*(length(d$Diameter)-1)#total square error in data se
R1<-1-SQEm1/SQTm1
R1

#Model 2
SQEm2<-summary(m2)$sigma^2*summary(m2)$df[2]# mean square error of model 
R2<-1-SQEm2/SQTm1
R2

In my weak opinion model 1 is "better" that model 2. My question is, does this approach sounds correct? Is there any way to compare these models types?

Thanks in advance!

this way of comparing models doesn't penalize models for complexity and risks overfitting the data. you would be safer comparing your models via cross-validation — gfgm, Mar 28 '19 at 14:01
IF the models have the same number of parameters as is the case here then you can just use the sum of squares of residuals: `deviance(m1); deviance(m2)` where lower is better. Also graph the fit superimposed on the data and that may make it obvious which model fits best. Be sure to sort the data on Age so that the plots come out right. — G. Grothendieck, Mar 28 '19 at 17:29

Leprechault · Accepted Answer · 2019-03-28T19:07:26.720

#First cross-validation approach ------------------------------------------

#Cross-validation model 1
set.seed(123) # for reproducibility

n <- nrow(d)
frac <- 0.8
ix <- sample(n, frac * n) # indexes of in sample rows

e1<- Diameter ~ a1 * Age^a2 
#Algoritm Levenberg-Marquardt
m1 <-  nlsLM(e1, data = d,
     start = list(a1 = 0.1, a2 = 10),
     control = nls.control(maxiter = 1000), subset = ix)# in sample model

BOD.out <- d[-ix, ] # out of sample data
pred <- predict(m1, new = BOD.out)
act <- BOD.out$Diameter
RSS1 <- sum( (pred - act)^2 )
RSS1
#[1] 56435894734

#Cross-validation model 2
m2<-lm(Diameter ~ Age, data=d,, subset = ix)# in sample model
BOD.out2 <- d[-ix, ] # out of sample data
pred <- predict(m2, new = BOD.out2)
act <- BOD.out2$Diameter
RSS2 <- sum( (pred - act)^2 )
RSS2
#[1] 19.11031

# Sum of squares approach -----------------------------------------------
deviance(m1)
#[1] 238314429037

deviance(m2)
#[1] 257.8223

Based in gfgm and G. Grothendieck comments, RSS2 has lower error that RSS1 and comparing deviance(m2) and deviance(m2) too, than model 2 is better than model 1.

Approach for comparing linear, non-linear and different parameterization non-linear models

1 Answers1