1

I have data from an experiment with two conditions (dichotomous IV: 'condition'). I also want to make use of another IV which is metric ('hh'). My DV is also metric ('attention.hh'). I've already run a multiple regression model with an interaction of my IVs. Therefore, I centered the metric IV by doing this:

hh.cen <- as.numeric(scale(data$hh, scale = FALSE))

with these variables I ran the following analysis:

model.hh <- lm(attention.hh ~ hh.cen * condition, data = data)
summary(model.hh)

 The results are as follows:

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
 (Intercept)        0.04309    3.83335   0.011    0.991
 hh.cen             4.97842    7.80610   0.638    0.525
 condition          4.70662    5.63801   0.835    0.406
 hh.cen:condition -13.83022   11.06636  -1.250    0.215

However, the theory behind my analysis tells me, that I should expect a quadratic relation of my metric IV (hh) and the DV (but only in one condition).

Looking at the plot, one could at least imply this relation:

plot

Of course I want to test this statistically. However, I'm struggling now how to compute the lineare regression model.

I have two solutions I think that should be good, leading to different outcomes. Unfortunately, I don't know which is the right one now. I know, that by including interactions (and 3-way interactions) into the model, I also have to include all simple/main effects as well.

  1. Solution: Including all terms on their own:

therefore I first compute the squared IV:

attention.hh.cen <- scale(data$attention.hh, scale = FALSE)

now i can compute the linear model:

sqr.model.1 <- lm(attention.hh.cen ~ condition + hh.cen + hh.sqr + (condition : hh.cen) + (condition : hh.sqr) , data = data) 

summary(sqr.model.1)

This leads to the following outcome:

Call:
lm(formula = attention.hh.cen ~ condition + hh.cen + hh.sqr + 
    (condition:hh.cen) + (condition:hh.sqr), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-53.798 -14.527   2.912  13.111  49.119 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)  
(Intercept)       -1.3475     3.5312  -0.382   0.7037  
condition         -9.2184     5.6590  -1.629   0.1069  
hh.cen             4.0816     6.0200   0.678   0.4996  
hh.sqr             5.0555     8.1614   0.619   0.5372  
condition:hh.cen  -0.3563     8.6864  -0.041   0.9674  
condition:hh.sqr  33.5489    13.6448   2.459   0.0159 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.77 on 87 degrees of freedom
Multiple R-squared:  0.1335,    Adjusted R-squared:  0.08365 
    F-statistic:  2.68 on 5 and 87 DF,  p-value: 0.02664
  1. Solution: R includes all main effects of an interaction by using the *

    sqr.model.2 <- lm(attention.hh.cen ~ condition * I(hh.cen^2), data = data)

    summary(sqr.model.2)

IMHO, this should also be fine -- however, the output is not the same as the one received by the code above

Call:
lm(formula = attention.hh.cen ~ condition * I(hh.cen^2), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.297 -13.353   2.508  12.504  49.740 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)  
(Intercept)             -1.300      3.507  -0.371   0.7117  
condition               -8.672      5.532  -1.567   0.1206  
I(hh.cen^2)              4.490      8.064   0.557   0.5791  
condition:I(hh.cen^2)   32.315     13.190   2.450   0.0162 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.64 on 89 degrees of freedom
Multiple R-squared:  0.1254,    Adjusted R-squared:  0.09587 
F-statistic: 4.252 on 3 and 89 DF,  p-value: 0.007431

I'd rather go with solution number 1 but I'm not sure about that.

Maybe someone has a better solution or can help me out?

Mathias
  • 47
  • 10
  • I think this is a methodology question that would be better asked on [CrossValidated](http://stats.stackexchange.com/). SO is primarily for focused programming questions. – lmo May 11 '16 at 12:02
  • Actually I thought that's a question for R enthusiasts, as it depends on R whether you have to include all variables of an interaction on there own, or not (using R)... – Mathias May 11 '16 at 12:10
  • But isn't the decision whether or not to include them primarily based on statistical evidence? – lmo May 11 '16 at 12:14
  • Sure. Ok maybe that's a misunderstanding. The question is not whether or not I need to include the linear term (main effects). This question has already been discussed elsewhere: http://stats.stackexchange.com/questions/28730/does-it-make-sense-to-add-a-quadratic-term-but-not-the-linear-term-to-a-model My question is rather, HOW I compute the model using R - does that make sense? – Mathias May 11 '16 at 12:16
  • sort of. One thing you are missing in solution 2, is a linear hh.cen term. `I(hh.cen^2)` squares hh.cen and then adds it as a covariate. As hh.cen is present in solution 1, it should probably also be present in solution 2. – lmo May 11 '16 at 12:24
  • This means that solution 1 is "right", since main effects should always (or mostly) be entered into the model as well right? see e.g. http://www.jeremydawson.co.uk/slopes.htm – Mathias May 11 '16 at 12:36
  • From everything I've seen, including a second order (interaction or quadratic) term without including the first order term(s) that compose it should be rare and requires explicit justification from theory of the specific field. – lmo May 11 '16 at 12:45
  • Ok, this sounds good to me and is a perfect take home message for me so far! thanks for your help! – Mathias May 11 '16 at 12:49

0 Answers0