What's the correct statistical test/R function for assessing multiple, binary test answers with different lengths?

Question

I have a pool of questions. Each participant will have to answer a same fixed number of random questions from that pool. I then separated the participants into two groups based on another variable.

How can I assess which group did better in R?

You won't get much help by asking that kind of question in a place like this. A more appropriate question may be: "How do I perform an unpaired t-test in R?" or "How can I best visualize the distribution of my outcome variable stratified by another variable in R?". Even better, show a tiny bit of your data (see `dput`). How's that sound? — Edward, Mar 04 '20 at 13:25
Thanks Edward — this wasn't actually my original question but someone suggested these edits. I'm OK on T tests, but I'm just not sure what the correct statistical test is when I have two groups, one where the data look like: q1 q2 q3 1 0 1 0 1 NA NA 0 1 and group two something like: q1 q2 q3 1 0 NA NA 1 0 1 0 1 (my actual data have 6 questions and 49 total participants) thanks, — fattytuna, Mar 04 '20 at 14:30

score 0 · Answer 1 · answered Mar 13 '20 at 23:54

Let's try something like this then:

df = data.frame(id=1:49,
q1 = sample(0:1,49,prob=c(0.7,0.3),replace=TRUE),
q2 = sample(0:1,49,prob=c(0.5,0.5),replace=TRUE),
q3 = sample(0:1,49,prob=c(0.3,0.7),replace=TRUE),
group = sample(c("a","b"),49,replace=TRUE)
)

You can either test each question's association with group using a fisher test on each column, for example below we do between q1 and group:

fisher.test(table(df$q1,df$group))

    Fisher's Exact Test for Count Data

data:  table(df$q1, df$group)
p-value = 0.5072
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.09971675 2.43186118
sample estimates:
odds ratio 
 0.5346084

You can set up mixed model if the questions are related, and there's an effect of the individual:

library(lme4)
newdf = pivot_longer(df,-c(id,group))
glmer(value ~ name*group + (1|id),data=newdf,family="binomial")

I think the fisher test might be most direct for you.

What's the correct statistical test/R function for assessing multiple, binary test answers with different lengths?

1 Answers1