10

I am trying to get the final model using backward elimination with R but I got the following error message when I ran the code. Could anyone please help me this?

base<-lm(Eeff~NDF,data=phuong)
fullmodel<-lm(Eeff~NDF+ADF+CP+NEL+DMI+FCM,data=phuong)
 step(full, direction = "backward", trace=FALSE )

> Error in step(full, direction = "backward", trace = FALSE) : 
number of rows in use has changed: remove missing values?
Gorka
  • 3,555
  • 1
  • 31
  • 37
hn.phuong
  • 835
  • 6
  • 15
  • 24

1 Answers1

14

When comparing different submodels, it is necessary that they be fitted to the same set of data -- otherwise the results just don't make sense. (Consider the extreme situation where you have two predictors A and B, which are each measured on only half of your observations -- then the model y~A+B will be fitted to all the data, but the models y~A and y~B will be fitted to non-overlapping subsets of the data.) Thus, step won't allow you to compare submodels that (because of automatic removal of cases containing NA values) are using different subsets of the original data set.

Using na.omit on the original data set should fix the problem.

fullmodel <- lm(Eeff ~ NDF + ADF + CP + NEL + DMI + FCM, data = na.omit(phuong))
step(fullmodel, direction = "backward", trace=FALSE ) 

However, if you have a lot of NA values in different predictors, you may end up losing a lot of your data set -- in an extreme case you could lose the entire data set. If this happens you have to reconsider your modeling strategy ...

Gorka
  • 3,555
  • 1
  • 31
  • 37
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I've got a doubt, don't you have to put fullmodel instead of full in step(full, direction = "backward", trace=FALSE )? Am I wrong? – Jilber Urbina Aug 01 '12 at 22:08
  • yes. I just copied the OP's code without looking at it too carefully. Thanks. – Ben Bolker Aug 01 '12 at 22:58
  • yeah, it works. Thank you very much Ben Bolker for your comments. – hn.phuong Aug 02 '12 at 08:00
  • Another question arise is that could we do the same'backward elimination' as above but now not using the linear regression (lm) but mixed model (lmer) instead? fullmodel<-lmer(Eeff~NDF+ADF+CP+NEL+DMI+FCM + (1|Study),data=na.omit(phuong)) step(fullmodel, direction = "backward", trace=FALSE ) – hn.phuong Aug 02 '12 at 08:19
  • I believe `drop1` works for `lmer` fits, but it looks like `step` doesn't. May I also caution you against stepwise approaches? There are *some* contexts where they make sense, but most of the time they're a bad idea -- try Googling "Harrell stepwise" to read some of the critiques. – Ben Bolker Aug 02 '12 at 13:25