2

I have a query about the output statistics gained from linear mixed models (using the lmer function) relative to the output statistics taken from the estimated marginal means gained from this model

Essentially, I am running an LMM comparing the within-subjects effect of different contexts (with "Negative" coded as the baseline) on enjoyment ratings. The LMM output suggests that the difference between negative and polite contexts is not significant, with a p-value of .35. See the screenshot below with the relevant line highlighted:

LMM output

However, when I then run the lsmeans function on the same model (with the Holm correction), the p-value for the comparison between Negative and Polite context categories is now .05, and all of the other statistics have changed too. Again, see the screenshot below with the relevant line highlighted:

LSMeans output

I'm probably being dense because my understanding of LMMs isn't hugely advanced, but I've tried to Google the reason for this and yet I can't seem to find out why? I don't think it has anything to do with the corrections because the smaller p-value is observed when the Holm correction is used. Therefore, I was wondering why this is the case, and which value I should report/stick with and why?

Thank you for your help!

Samedi
  • 23
  • 4

2 Answers2

1

Regression coefficients and marginal means are not one and the same. Once you learn these concepts it'll be easier to figure out which one is more informative and therefore which one you should report.

After we fit a regression by estimating its coefficients, we can predict the outcome yi given the m input variables Xi = (Xi1, ..., Xim). If the inputs are informative about the outcome, the predicted yi is different for different Xi. If we average the predictions yi for examples with Xij = xj, we get the marginal effect of the jth feature at the value xj. It's crucial to keep track of which inputs are kept fixed (and at what values) and which inputs are averaged over (aka marginalized out).

In your case, contextCatPolite in the coefficients summary is the difference between Polite and Negative when smileType is set to its reference level (no reward, I'd guess). In the emmeans contrasts, Polite - Negative is the average difference over all smileTypes.

Interactions have a way of making interpretation more challenging and your model includes an interaction between smileType and contextCat. See Interaction analysis in emmeans.

dipetkov
  • 3,380
  • 1
  • 11
  • 19
1

To add to @dipetkov's answer, the coefficients in your LMM are based on treatment coding (sometimes called 'dummy' coding). With the interactions in the model, these coefficients are no longer "main-effects" in the traditional sense of factorial ANOVA. For instance, if you have:

y = b_0 + b_1(X_1) + b_2(X_2) + b_3 (X_1 * X_2)

...b_1 is "the effect of X_1" only when X_2 = 0:

y = b_0 + b_1(X_1) + b_2(0) + b_3 (X_1 * 0)  
y = b_0 + b_1(X_1)   

Thus, as @dipetkov points out, 1.625 is not the difference between Negative and Polite on average across all other factors (which you get from emmeans). Instead, this coefficient is the difference between Negative and Polite specifically when smileType = 0.

If you use contrast coding instead of treatment coding, then the coefficients from the regression output would match the estimated marginal means, because smileType = 0 would now be on average across smile types. The coding scheme thus has a huge effect on the estimated values and statistical significance of regression coefficients, but it should not effect F-tests based on the reduction in deviance/variance (because no matter how you code it, a given variable explains the same amount of variance).

https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/

  • Just getting back to you on this after a few weeks! Basically, I was wondering if you knew of any way I could calculate omnibus p-values, e.g.: smileType main effect, contextCat main effect, and smileType:contextCat interaction purely from the output of an LMM? I know I can use the anova function to generate these values for lmer models, but (for various reasons) this doesn't work when I use the robust lmm package. Is there any way of getting these simple main effects from the p-values or t-values outputted by robust lmm? – Samedi Aug 09 '22 at 10:09