Mixed-effects model for stepwise regression (lme4 syntax)

Question

I am trying to put together a maximal model for stepwise regression ('backward selection') using lme4 syntax and would be very grateful for any help as this is the first time I am doing this.

Here's my situation: In the context of a linguistic research project, I conducted a psycholinguistic experiment in six different locations (VARIETY, levels: U, V, W, X, Y, Z). The experiment tested two linguistic variables A and B with two levels each (TRUE, FALSE). These variables were crossed: A(TRUE)+B(TRUE); A(TRUE)+B(FALSE); A(FALSE)+B(TRUE); A(FALSE)+B(FALSE) and each combination was tested using two different lexicalizations, resulting in eight test items (ITEM, levels: S1-S8).

Because of the experiment's Latin squares design, the eight ITEMs were varied across four material sets (MS, levels: MS1-4). The participants' (SUBJECT) ratings were normalized to z-scores (Z_SCORE). The dataset that I want to analyze looks as follows:

Dataset example

The aim of the experiment was to find out (1) whether there were effects regarding the various combinations of syntactic variables A and B, and (2) whether there were variety-specific effects.

So, in my understanding, in a mixed model the following would be the fixed effects: VARIETY, FEATURE_A and FEATURE_B. And the random effects would be: SUBJECT, ITEM, and MS.

My first try at a maximum model, in lme4 syntax and based on the literature that I have so far read, looks like this: Z_SCORE ~ VARIETY * FEATURE_A * FEATURE_B + (1|SUBJECT) + (1|ITEM) + (1|MS). I have already carried out a stepwise regression in R and it all looks good, however, I am not sure as to whether my lme4 formula really accounts for the experimental setup well. In particular, I am still unsure whether the relationship between ITEM and MS is accounted for.

Any pointers or ideas are much appreciated.

Edit: Here's a reproducible example of my dataset (I am quite new to R so I apologize for not providing it in 'dput' format, which I am not familiar with yet):

SUBJECT <- c("SUBJ_001", "SUBJ_002", "SUBJ_003", "SUBJ_004", "SUBJ_005", "SUBJ_006", "SUBJ_007", "SUBJ_008", "SUBJ_009", "SUBJ_010", "SUBJ_011", "SUBJ_012", "SUBJ_013", "SUBJ_014", "SUBJ_015", "SUBJ_016", "SUBJ_017", "SUBJ_018", "SUBJ_019", "SUBJ_020", "SUBJ_021", "SUBJ_022", "SUBJ_023", "SUBJ_024")

SUBJECT <- rep(SUBJECT, each = 8)

VARIETY <- LETTERS[1:6]

VARIETY <- rep(VARIETY, each = 32)

FEATURE_A <- c("TRUE", "TRUE", "FALSE", "FALSE")

FEATURE_A <- rep(FEATURE_A, 48)

FEATURE_B <- c("TRUE", "FALSE")

FEATURE_B <- rep(FEATURE_B, 96)

MS1 <- c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8")

MS2 <- c("S2", "S3", "S4", "S1", "S6", "S7", "S8", "S5")

MS3 <- c("S3", "S4", "S1", "S2", "S7", "S8", "S5", "S6")

MS4 <- c("S4", "S1", "S2", "S3", "S8", "S5", "S6", "S7")

ITEM <- c(MS1, MS2, MS3, MS4)

ITEM <- rep(ITEM, 6)

MS <- c("MS1", "MS2", "MS3", "MS4")

MS <- rep(MS, each = 8)

MS <- rep(MS, 6)

set.seed(321)

Z_SCORE <- rnorm(192, sd = 1)

df <- data.frame(SUBJECT, VARIETY, FEATURE_A, FEATURE_B, ITEM, MS, Z_SCORE)

df

Provide a reproducible example of your dataset(dput format prefered) — Behnam Hedayat, May 26 '21 at 15:29
I edited the post to provide a reproducible example of my data set - I hope this works. — Jake, May 28 '21 at 07:45

Mixed-effects model for stepwise regression (lme4 syntax)

0 Answers0