I am trying to put together a maximal model for stepwise regression ('backward selection') using lme4 syntax and would be very grateful for any help as this is the first time I am doing this.
Here's my situation: In the context of a linguistic research project, I conducted a psycholinguistic experiment in six different locations (VARIETY, levels: U, V, W, X, Y, Z). The experiment tested two linguistic variables A and B with two levels each (TRUE, FALSE). These variables were crossed: A(TRUE)+B(TRUE); A(TRUE)+B(FALSE); A(FALSE)+B(TRUE); A(FALSE)+B(FALSE) and each combination was tested using two different lexicalizations, resulting in eight test items (ITEM, levels: S1-S8).
Because of the experiment's Latin squares design, the eight ITEMs were varied across four material sets (MS, levels: MS1-4). The participants' (SUBJECT) ratings were normalized to z-scores (Z_SCORE). The dataset that I want to analyze looks as follows:
The aim of the experiment was to find out (1) whether there were effects regarding the various combinations of syntactic variables A and B, and (2) whether there were variety-specific effects.
So, in my understanding, in a mixed model the following would be the fixed effects: VARIETY, FEATURE_A and FEATURE_B. And the random effects would be: SUBJECT, ITEM, and MS.
My first try at a maximum model, in lme4 syntax and based on the literature that I have so far read, looks like this: Z_SCORE ~ VARIETY * FEATURE_A * FEATURE_B + (1|SUBJECT) + (1|ITEM) + (1|MS). I have already carried out a stepwise regression in R and it all looks good, however, I am not sure as to whether my lme4 formula really accounts for the experimental setup well. In particular, I am still unsure whether the relationship between ITEM and MS is accounted for.
Any pointers or ideas are much appreciated.
- Edit: Here's a reproducible example of my dataset (I am quite new to R so I apologize for not providing it in 'dput' format, which I am not familiar with yet):
SUBJECT <- c("SUBJ_001", "SUBJ_002", "SUBJ_003", "SUBJ_004", "SUBJ_005", "SUBJ_006", "SUBJ_007", "SUBJ_008", "SUBJ_009", "SUBJ_010", "SUBJ_011", "SUBJ_012", "SUBJ_013", "SUBJ_014", "SUBJ_015", "SUBJ_016", "SUBJ_017", "SUBJ_018", "SUBJ_019", "SUBJ_020", "SUBJ_021", "SUBJ_022", "SUBJ_023", "SUBJ_024")
SUBJECT <- rep(SUBJECT, each = 8)
VARIETY <- LETTERS[1:6]
VARIETY <- rep(VARIETY, each = 32)
FEATURE_A <- c("TRUE", "TRUE", "FALSE", "FALSE")
FEATURE_A <- rep(FEATURE_A, 48)
FEATURE_B <- c("TRUE", "FALSE")
FEATURE_B <- rep(FEATURE_B, 96)
MS1 <- c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8")
MS2 <- c("S2", "S3", "S4", "S1", "S6", "S7", "S8", "S5")
MS3 <- c("S3", "S4", "S1", "S2", "S7", "S8", "S5", "S6")
MS4 <- c("S4", "S1", "S2", "S3", "S8", "S5", "S6", "S7")
ITEM <- c(MS1, MS2, MS3, MS4)
ITEM <- rep(ITEM, 6)
MS <- c("MS1", "MS2", "MS3", "MS4")
MS <- rep(MS, each = 8)
MS <- rep(MS, 6)
set.seed(321)
Z_SCORE <- rnorm(192, sd = 1)
df <- data.frame(SUBJECT, VARIETY, FEATURE_A, FEATURE_B, ITEM, MS, Z_SCORE)
df