0

I have a data frame like this:

   diagnosis  A  B  C  D
1        yes  1  1  0  1
2         no  0  1  0  1
3        yes  0  1  0  1
4        yes  1  1  1  1
5        yes NA  1 NA  0
6         no  1 NA  0  1
7        yes  1  0  0  0
8         no  0  0  1  1
9         no  0  1  1 NA
10        no  1  0  1  1

A, B, C, and D refer to the questions in my test and the number "1" means the participant got it right and "0" means the participant's answer is wrong.

What I want is to perform multiple two sample t-tests for each question and the total score for the test.

And these are the steps I took so far:

#calculate sum score per participant
mydf <- cbind(mydf, Total = rowSums(mydf[,2:5]))
#Reshape the tibble from wide to long format 
mydf <- mydf %>%
  pivot_longer(!diagnosis, names_to = "Questions", values_to = "Score")
#summary of my data 
Sumdf <- mydf %>% group_by(Questions, diagnosis) %>% get_summary_stats(Score, type = "mean_sd")
Sumdf
A tibble: 10 x 6
   diagnosis Questions variable     n  mean    sd
   <chr>     <chr>     <chr>    <dbl> <dbl> <dbl>
 1 no        A         Score        5  0.4  0.548
 2 yes       A         Score        4  0.75 0.5  
 3 no        B         Score        4  0.5  0.577
 4 yes       B         Score        5  0.8  0.447
 5 no        C         Score        5  0.6  0.548
 6 yes       C         Score        4  0.25 0.5  
 7 no        D         Score        4  1    0    
 8 yes       D         Score        5  0.6  0.548
 9 no        Total     Score        3  2.33 0.577
10 yes       Total     Score        4  2.5  1.29 

After this point how can I compare as a t-test those means for each question and the total score across diagnoses?

I actually found something on internet like this:

#Run T-test
ttest <- mydf %>%
  group_by(Questions) %>%
  t_test(Score ~ diagnosis) %>%
  adjust_pvalue(method = "BH") %>%
  add_significance()

And this is what I got:

test But as you can see, here n values are not true(because I had NAs) and I don't know why and how adjusted p values are the same for the questions. I read that when running multiple t-tests it is better to use adjusted p values but I am not sure about it. Also, I want to include means and sd's in my table too(I actually plan to knit this script to the pdf with papaja)

So, is there any other way to run multiple t-tests or do you think what I found looks trustable and as the code suggests, I should rely on adjusted p values?

Thank you so much!

dplyr
  • 83
  • 5
  • 4
    A t-test seems like the wrong type of test here. You have binary data, and the proportions of correct answers for each question will be constrained between 0 and 1. Perhaps you should tabulate the results and run a Chi Square or Fisher's test? – Allan Cameron Sep 13 '22 at 19:13
  • @Allan Cameron, thank you for your comment. so, do you think should I take the answers as True or False and as categorical values and then put them into correlation analysis? – dplyr Sep 14 '22 at 14:46
  • you can use TRUE/FALSE or 0/1, it doesn't really matter – Allan Cameron Sep 14 '22 at 16:06

0 Answers0