0

I have a generalized mixed model that has 2 factors (fac1 (2 levels), fac2 (3 levels)) and 4 continuous variables (x1,x2,x3,x4) as fixed effects and a continuous response. I am interested in answering:

  1. are the main effects x1-x4 (slopes) significant ignoring fac1 and fac2
  2. are fac1 and fac2 levels significantly different from the model mean and to each other
  3. is there a difference in slopes between fac1 levels and fac2 levels and fac1*fac2 levels

This means I would need to include interations in my model (random effects ignored here) say: Y~x1+x2+x3+x4+fac1+fac2+x1:fac1+x2:fac1+x3:fac1+x4:fac1+x1:fac2+x2:fac2+x3:fac2+x4:fac2 but now my coefficients for x1-x4 are based on my ref level and interpretation of the overall main effects is not possible. Also do I have to include xi:fac1:fac2+fac1:fac2 in my model as well to answer 3)? is there an R package that can do this? I though about refitting the model (e.g. without the interactions) to answer 1) but the data points in each factor level are not the same so ignoring this in Y~x1+x2+x3+x4 the slope of the most common factor combination may dominate the result and inference? I know you can use contrasts e.g. by not dummy coding a factor with 2 levels to 0 and 1 but -0.5,0.5 but not sure how something would look like in this case. Would it be better to ease the model combining the factors first e.g.

fac3<-interaction(fac1,fac2) #and then 
Y~x1+x2+x3+x4+x1:fac3+x2:fac3+x3:fac3+x4:fac3

But how do I answer 1-3 from that.

Thanks a lot for your advice

JmO
  • 572
  • 1
  • 4
  • 20

1 Answers1

0

I think you have to take a step back and ask yourself what hypotheses exactly you want to test here. Taken word for word, your 3-point list results in a lot (!) of hypotheses tests, some of which can be done in the same model, some requiring different parameterizations. Given that the question at hand is about hypotheses and not how to code them in R, this is more about statistics rather than programming and may be better moved to CrossValidated.
Nevertheless, for starters, I would propose the following procedure:

  1. To test x1-x4 alone, just add all of these to your model, then use drop1() to check which of them actually add to the model fit.
  2. In order to reduce the number of hypothesis tests (and different models to fit), I suggest you also test for each factor and the interaction as whole whether it is relevant. Again, add all three terms (both factors and interaction, so just fac1*fac2 if they are formatted as factors) to the model and use drop1.
  3. This point alone includes many potential hypotheses/contrasts to test. Depending on parameterization (dummy or effect coding), for each of the 4 continuous predictors you have 3 or 5 first-order interactions with the factor dummies/effects and 2 or 6 second-order interactions, given that you test each group against all others. This is a total of 20 or 44 tests and means that it becomes very likely that you have false-positives (if you test at 95% confidence level). Additionally, please ask yourself whether these interactions can even be interpreted in a meaningful way. Therefore, I would advise that you to either focus on some specific interactions that you expect to be relevant. If you really want to explore all interactions, just test entire interactions (e.g. fac1:x1, not specific contrasts) first. For this you have to make 8 models, each including one factor-continuous interaction, then compare all of them to the no-interaction model, using anova().

One last thing: I have assumed that you have already figured out the random variable structure of your model (i.e. what cluster variable(s) to consider and whether there should be random slopes). If not, do that first.

benimwolfspelz
  • 679
  • 5
  • 17
  • thanks a lot for your help! ideally I would not like to do any stepwise selection process – JmO Apr 11 '22 at 16:05