0

I am running some multilevel regressions. Somehow, the predictor "trst_means" appears twice in the regression summary (and probably distorts the results).

library(lme4)
swd.mod1 <- lmer(stfdem ~ 1+gndr+agea.rc+trst_means+icpdwk2+eisced.rc+hinctnta.rc+clsprty+polintr+(1|cntry),REML = T,data = ESS_subset)
summary(swd.mod1)

As you can see below, trst_means appears as trst_means2 and trst_means3.

Linear mixed model fit by REML ['lmerMod']
Formula: stfdem ~ 1 + gndr + agea.rc + trst_means + icpdwk2 + eisced.rc +  
    hinctnta.rc + clsprty + polintr + (1 | cntry)
   Data: ESS_subset

REML criterion at convergence: 145674.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.5415 -0.6089  0.0563  0.6537  3.6293 

Random effects:
 Groups   Name        Variance Std.Dev.
 cntry    (Intercept) 0.5894   0.7677  
 Residual             3.7336   1.9323  
Number of obs: 35014, groups:  cntry, 25

Fixed effects:
             Estimate Std. Error t value
(Intercept)  3.852760   0.178131  21.629
gndr        -0.146807   0.021119  -6.951
agea.rc      0.008464   0.006412   1.320
trst_means2  1.882941   0.024889  75.655
trst_means3  3.286516   0.042815  76.760
icpdwk2      0.056214   0.023767   2.365
eisced.rc    0.055574   0.014223   3.907
hinctnta.rc  0.214884   0.015703  13.684
clsprty     -0.195291   0.022631  -8.629
polintr     -0.001448   0.013307  -0.109

Trst_means is a variable I have recoded in the following way:

trstinst$trst_means.rc <- as.data.frame(sapply(trstinst, function(x)cut(x, 
           breaks = c(0, 3.6, 7.2, 10), 
           labels = c(1,2,3)))
           )

Here would be an extract of the data frame I am working with:

df = dput(head(ESS_subset))
structure(list(idno = c(10105L, 10107L, 10109L, 10201L, 10202L, 
10208L), cntry = c("BE", "BE", "BE", "BE", "BE", "BE"), stfdem = c(5L, 
1L, 6L, 9L, 2L, 7L), gndr = c(1L, 1L, 2L, 2L, 2L, 1L), clsprty = c(2L, 
2L, 2L, 2L, 1L, 1L), polintr = c(3L, 3L, 3L, 3L, 3L, 2L), icpdwk2 = c(2L, 
1L, 2L, 1L, 1L, 1L), agea.rc = c(1, 3, 7, 2, 2, 3), hinctnta.rc = c(NA, 
2, 1, 1, 2, 2), eisced.rc = c(2, 4, 2, 2, 2, 4), trst_means = c("2", 
"1", "1", "2", "2", "2"), wl = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    wl = c(NA, NA, NA, NA, 0, 1)), row.names = c(NA, 6L), class = "data.frame")

Thank you!

  • 1
    Try `class(ESS_subset$trst_means)`. My guess is that it will return `factor`. Factors are categorical variables, and so R defaults to dummy-coding these variables. Assuming that `trst_means` has levels 1, 2, and 3, it will use 1 as the reference, and thus you see the main effects against 2 (`trst_means2`) and 3 (`trst_means3`). – slamballais May 16 '21 at 17:53
  • `trst_means2 1.882941 0.024889 75.655 trst_means3 3.286516 0.042815 76.760` are the estimates to the referrence valuue `trst_means1` – TarJae May 16 '21 at 17:56
  • Okay, I have tried it and it returned character - does your description apply to this data type as well? –  May 16 '21 at 18:00
  • I see, how can I have only one estimate, though? –  May 16 '21 at 18:02
  • 1
    Well, you have 3 groups; how would you define the coefficient? As the overall effect across the groups? Then you may want to make `trst_means` an ordered factor (where 1 < 2 < 3). However, this will simply add `trst_means` as a numeric variable (1, 2, 3) into your regression. This is not optimal, since you originally started with data running from 0 to 10, and now it's from 1 to 3. Essentially, you lose information compared to just adding the original variable to your model. – slamballais May 16 '21 at 18:07
  • Yes, I would need the overall effect! You make a good point here. However, I have to have three groups as I need to replicate a paper that did so. –  May 16 '21 at 18:14
  • 1
    Alright, well, you could consider ordered factors, but keep in mind that they are not as straightforward to use as I'm saying here. By default, they use a polynomial contrast, so they'll also fit polynomial coefficients (see [here](https://stackoverflow.com/questions/25735636/interpretation-of-ordered-and-non-ordered-factors-vs-numerical-predictors-in-m)). Also, there may be better ways to get an overall effect for a categorical variable, but I'm not too familiar with those methods. I would recommend Googling around a bit. I hope that at least your original question is answered. – slamballais May 16 '21 at 18:23
  • Okay, I will try that. Thank you for the guidance and help & have a nice Sunday evening! –  May 16 '21 at 18:24

0 Answers0