0

My current linear model is: fit<-lm(ES~Area+Anear+Dist+DistSC+Elevation)

I have been asked to further this by:

Fit a linear model for ES using the five explanatory variables and include up to quadratic terms and first order interactions (i.e. allow Area^2 and Area*Elevation, but don't allow Area^3 or Area*Elevation*Dist).

From my research I can do +I(Area^2) and +(Area*Elevation) but this would make a huge list.

Assuming I am understanding the question correctly I would be adding 5 squared terms and 10 * terms giving 20 total. Or do I not need all of these?

Is that really the most efficient way of going about it?

EDIT:

Note that I am planning on carrying out a stepwise regression for the null model and the full model after. I am seemingly having trouble with this when using poly.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
JRSR
  • 39
  • 1
  • 5

1 Answers1

2

Look at ?formula to further your education:

fit<-lm( ES~ (Area+Anear+Dist+DistSC+Elevation)^2 )

Those will not be squared terms but rather part of what you were asked to provide... all the 2-way interactions (and main effects). Formula "mathematics" is different than regular use of powers. To add the squared terms in a manner that allows proper statistical interpretation use poly

fit<-lm( ES~ (Area+Anear+Dist+DistSC+Elevation)^2 +  
             poly(Area,2) +poly(Anear,2)+ poly(Dist,2)+ poly(DistSC,2)+ poly(Elevation,2) )
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    `poly(Area, Anear, Dist, DistSC, Elevation, degree = 2)` takes care of all of the terms of interest. – Dason Apr 02 '14 at 11:50
  • Thanks for both of your comments. I assume my output "poly(Area, Anear, Dist, DistSC, Elevation, degree = 2)1.0.0.0.0" means Area whereas "poly(Area, Anear, Dist, DistSC, Elevation, degree = 2)0.0.0.2.0" means DistSC^2. – JRSR Apr 02 '14 at 12:27
  • My stepwise regression does not seem to like the use of poly, whereas if I just have all the terms it seems to run fine, can I fix this? – JRSR Apr 02 '14 at 13:29
  • 1
    The experts in the R world do not think that stepwise regression is a valid form of inference, so it doesn't surprise me that their recommended approach will not fit well with stepwise approaches. – IRTFM Apr 02 '14 at 14:37
  • 1
    Furthermore, the use of I(Area^2) will _never_ be valid in the context of stepwise regression, even leaving aside issues of the the invalidity of stepwise methods. This is due to the fact that most of the time X^2 will be closely correlated with X, and tests for curvature will be contaminated by that reality. You desperately need to consult a statistician. – IRTFM Apr 02 '14 at 14:57
  • Well after fitting the linear model to allow up to quadratic terms the question is: Carry out a stepwise regression to find the best model using the AIC as your criterion. You should try starting with the null model and adding variables, and with the full model and removing variables. Do you reach the same optimum model both times? Explain which model you would select, carefully defining your chosen model. I had thought step(fit0,direction="forward",scope=(~ poly as above)) and step(fitnew,direction="backward") where fit0 is the null and fitnew is the full model would work but I'm stuck. – JRSR Apr 02 '14 at 15:16
  • You might want to look at the examples in stepAIC in the MASS package. When Ripley does it, he uses scale on the quadratic terms. That centers and norms the quadratic terms which should help to avoid inappropriate correlations with the linear term: `birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2) + I(scale(lwt)^2), trace = FALSE)`. I do not think his inclusion of stepAIC in MASS is an endorsement of stepwise methods, though. – IRTFM Apr 02 '14 at 18:35