I have a data frame like this:
diagnosis A B C D
1 yes 1 1 0 1
2 no 0 1 0 1
3 yes 0 1 0 1
4 yes 1 1 1 1
5 yes NA 1 NA 0
6 no 1 NA 0 1
7 yes 1 0 0 0
8 no 0 0 1 1
9 no 0 1 1 NA
10 no 1 0 1 1
A, B, C, and D refer to the questions in my test and the number "1" means the participant got it right and "0" means the participant's answer is wrong.
What I want is to perform multiple two sample t-tests for each question and the total score for the test.
And these are the steps I took so far:
#calculate sum score per participant
mydf <- cbind(mydf, Total = rowSums(mydf[,2:5]))
#Reshape the tibble from wide to long format
mydf <- mydf %>%
pivot_longer(!diagnosis, names_to = "Questions", values_to = "Score")
#summary of my data
Sumdf <- mydf %>% group_by(Questions, diagnosis) %>% get_summary_stats(Score, type = "mean_sd")
Sumdf
A tibble: 10 x 6
diagnosis Questions variable n mean sd
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 no A Score 5 0.4 0.548
2 yes A Score 4 0.75 0.5
3 no B Score 4 0.5 0.577
4 yes B Score 5 0.8 0.447
5 no C Score 5 0.6 0.548
6 yes C Score 4 0.25 0.5
7 no D Score 4 1 0
8 yes D Score 5 0.6 0.548
9 no Total Score 3 2.33 0.577
10 yes Total Score 4 2.5 1.29
After this point how can I compare as a t-test those means for each question and the total score across diagnoses?
I actually found something on internet like this:
#Run T-test
ttest <- mydf %>%
group_by(Questions) %>%
t_test(Score ~ diagnosis) %>%
adjust_pvalue(method = "BH") %>%
add_significance()
And this is what I got:
But as you can see, here n values are not true(because I had NAs) and I don't know why and how adjusted p values are the same for the questions. I read that when running multiple t-tests it is better to use adjusted p values but I am not sure about it. Also, I want to include means and sd's in my table too(I actually plan to knit this script to the pdf with papaja)
So, is there any other way to run multiple t-tests or do you think what I found looks trustable and as the code suggests, I should rely on adjusted p values?
Thank you so much!