I hesitate to answer this question because I suspect the underlying statistical model is problematic, but this is the programming arm of StackExchange, so here's the programming solution:
Important points:
- A binary categorical variable represented as 0 and 1 is already dummy coded. It's certainly not problematic and at times useful to tell R that it's representing a categorical contrast via
factor
, but R will just turn that back into 0s and 1s before fitting that model.
- 0 and 1 can also be interpreted as boolean values. We can swap boolean values via negation.
- Using
dummy(lb_wght, "0")
is just providing a reversed encoding of the original binary variable. Likewise, dummy(lb_wght, "1")
is essentially a no-op -- it just gives you back the original dummy-contrasted variable.
- We can express the reversed encoding of
dummy(lb_wght, "0")
as boolean negation.
data wrangling warm up
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> # A two-level contrast represented as dummy coding means we have 0 and 1
> # This is the same as FALSE and TRUE and so we can swap them using boolean negation
> math <- within(math, {
+ lb_wght0 <- as.numeric(!lb_wght)
+ lb_wght1 <- lb_wght
+ })
> head(math) # check what those new vars look like
id female lb_wght anti_k1 math grade occ age men spring anti nb_wght lb_wght1 lb_wght0
1 201 1 0 0 38 1 2 111 0 1 0 1 0 1
2 201 1 0 0 55 3 3 135 1 1 0 1 0 1
3 303 1 0 1 26 0 2 121 0 1 2 1 0 1
4 303 1 0 1 33 3 3 145 0 1 2 1 0 1
5 2702 0 0 0 56 0 2 100 . 1 0 1 0 1
6 2702 0 0 0 58 2 3 125 . 1 2 1 0 1
no correlation in the RE
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade || id), # This line and
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
((0 + lb_wght0 | id) + (0 + lb_wght1 | id) + (0 + lb_wght0:grade | id) + (0 + grade:lb_wght1 | id))
Data: math
REML criterion at convergence: 15932
Scaled residuals:
Min 1Q Median 3Q Max
-3.06601 -0.52971 -0.00312 0.53661 2.54121
Random effects:
Groups Name Variance Std.Dev.
id lb_wght0 61.9312 7.8696
id.1 lb_wght1 84.7006 9.2033
id.2 lb_wght0:grade 0.7044 0.8393
id.3 grade:lb_wght1 0.6598 0.8123
Residual 36.3585 6.0298
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.48689 0.36619 96.91
lb_wght1 32.81813 1.35307 24.25
lb_wght0:grade 4.29283 0.09059 47.39
grade:lb_wght1 4.86449 0.32029 15.19
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.533 0.000
grd:lb_wgh1 0.000 -0.489 0.000
matching the correlation structure in the OP
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade | id) +
+ (0 + lb_wght1 + lb_wght1:grade | id),
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
(0 + lb_wght0 + lb_wght0:grade | id) + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000
for comparison, the OP's model:
> math <- within(math, lb_wght <- as.factor(lb_wght))
> m <- lmer(
+ math ~ 0 + lb_wght + lb_wght:grade +
+ (0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) + # This line and
+ (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) , # This line
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght + lb_wght:grade + (0 + dummy(lb_wght, "0") +
dummy(lb_wght, "0"):grade | id) + (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id dummy(lb_wght, "0") 61.1969 7.8228
dummy(lb_wght, "0"):grade 0.6711 0.8192 0.04
id.1 dummy(lb_wght, "1") 97.4494 9.8716
dummy(lb_wght, "1"):grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
lb_wght1:grade 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
lb_wght1:gr 0.000 -0.567 0.000
edit: 2021-02-08 20:57 UTC
NB This is not the question as asked, but instead what was demanded in a post-hoc comment. Nonetheless, the programmatic generation of indicator variables and a formula using base R is perhaps interesting. This could be rewritten to use apply
.... but that seems like premature optimization for the problem at hand. I don't think it's a good idea to generalize this approach to more than two levels without really thinking about what that model would even mean, but again, this is an answer to the programming question, not an endorsement of the statistical procedure it implements.
do an arbitrary number of levels in the experimental variable
This will automatically generate all the indicator variables and the formula.
> # GOOD LUCK WITH THIS
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> grps <- unique(math$lb_wght)
>
> form <- "math ~ 0"
>
> for(g in grps){
+ gname <- paste0("lb_wght",g)
+ math[, gname] <- ifelse(math$lb_wght == g, 1, 0)
+ term <- paste("0", gname, paste0(gname,":grade"), sep="+")
+ re <- paste0("(", term, "|id)")
+ form <- paste(form, term, re, sep="+")
+ }
>
> form <- as.formula(form)
>
> m <- lmer(form, data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + 0 + lb_wght0 + lb_wght0:grade + (0 + lb_wght0 + lb_wght0:grade |
id) + 0 + lb_wght1 + lb_wght1:grade + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000