I'm simultaneously trying to understand R's predict() function and the 'effects' package effect() function. Essentially, I'm running a regression to test the interaction of two dichotomous IVs on a DV while controlling for two continuous covariates. In my actual dataset, the interaction is significant and so now I would like to plot the interaction. Because I have covariates in my model, I should plot the means after controlling for these other variables (i.e. estimated marginal means in SPSS). I haven't done this in R before and while searching I've come to expect I should be able to obtain the values I need for graphing with either the effect() or the predict() functions. Therefore, I tried doing it with each on a randomly generated dataset:
> set.seed(100)
> test <- data.frame(iv1 = factor(round(rnorm(200, mean=.5, sd=.25), 0), levels=c(0,1), labels=c("A","B")), iv2 = factor(round(rnorm(200, mean=.5, sd=.25), 0), levels=c(0,1), labels=c("C","D")), cv1 = rnorm(200, mean=4, sd=1), cv2 = rnorm(200, mean=3, sd=1), dv = rnorm(200, mean=5, sd=1))
> mod <- lm(dv ~ cv1 + cv2 + iv1*iv2, data = test)
> new <- with(test, expand.grid(iv1 = levels(iv1), iv2 = levels(iv2), cv1 = mean(cv1), cv2 = mean(cv2)))
> test$pv <- predict(mod, newdata = new)
> tapply(test$pv, list(test$iv1, test$iv2), mean)
C D
A 5.076842 5.086218
B 5.025614 5.065399
> effect("iv1:iv2", mod)
iv1*iv2 effect
iv2
iv1 C D
A 5.019391 5.167275
B 5.216955 4.855195
Because I'm getting different results I exported the data to SPSS and ran an ANOVA doing the same thing and looked at the estimated marginal means (EMMEANS). These were identical to the results given by effect() in R.
SPSS syntax:
DATASET ACTIVATE DataSet1.
RECODE iv1 iv2 ('A'=-1) ('B'=1) ('C'=-1) ('D'=1) INTO iv1_recode iv2_recode.
EXECUTE.
UNIANOVA dv BY iv1_recode iv2_recode WITH cv1 cv2
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/EMMEANS=TABLES(OVERALL) WITH(cv1=MEAN cv2=MEAN)
/EMMEANS=TABLES(iv1_recode) WITH(cv1=MEAN cv2=MEAN)
/EMMEANS=TABLES(iv2_recode) WITH(cv1=MEAN cv2=MEAN)
/EMMEANS=TABLES(iv1_recode*iv2_recode) WITH(cv1=MEAN cv2=MEAN)
/PRINT=DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=cv1 cv2 iv1_recode iv2_recode iv1_recode*iv2_recode.
As a check, the SPSS output for the EMMEANS says, "Covariates appearing in the model are evaluated at the following values: cv1 = 3.996208827095569, cv2 = 3.052881951477868." These are identical to the values for the covariates that I used with predict:
> new
iv1 iv2 cv1 cv2
1 A C 3.996209 3.052882
2 B C 3.996209 3.052882
3 A D 3.996209 3.052882
4 B D 3.996209 3.052882
So what am I failing to understand? Or am I doing something stupid here (a distinct possibility)? This could be me not grasping what an estimated marginal mean is.
Any help is greatly appreciated!