0

*(I asked this question earlier, but it got migrated to stackexchange and was labeled 'unclear' and I couldn't edit it, so I'm going to try to clean up the question and make it more clear).

I have the following data frame and need to determine if there are statistically significant differences among the means of Test Groups, and repeat this for each Task Grouping. :

set.seed(123)

Task_Grouping <- sample(c("A","B","C"),500,replace=TRUE)
Test_Group <- sample(c("Green","Yellow","Orange"),500,replace=TRUE)
TotalTime <- rnorm(500, mean = 3, sd = 3)

mydataframe <- data.frame(Task_Grouping, Test_Group, TotalTime)

For example, for Task A, I need to see if there are significant differences in the means of the Test Groups (Green, Yellow, Orange).

I've tried the following code, but something is wrong since the p.value is the same for each Test Group combination among different Task Groupings (i.e. every p-value is 0.6190578):

results <- mydataframe %>%
  group_by(Task_Grouping) %>%
  do(tidy(pairwise.t.test(mydataframe$TotalTime, mydataframe$Test_Group,
                 p.adjust.method = "BH")))

I'm also not 100% sure if a pairwise.t.test is the correct statistical test to use. To rephrase, I need to see if the Test_Group means are statistically different from one another. And then I need to repeat this analysis for each Task Grouping.

Bjorno
  • 119
  • 7
  • Your example is not working as it needs at least 2 levels for that factor – akrun Nov 25 '19 at 19:57
  • Please try to read this https://www2.le.ac.uk/departments/health-sciences/research/biostats/youngsurv/pdf/MShanyinde.pdf. Try to use reverse Kaplan Meier method. – Mohamed Rahouma Nov 25 '19 at 19:59
  • 1
    This still seems like a statistics question rather than a programming question. It seems like you are just asking which statistical method is correct for your hypothesis. You need to first know what test you want to do before you can implement in any language. R doesn't eliminate the need for you to first choose the right statistical method for your data. And questions about model selection belong on [stats.se], not Stack Overflow. – MrFlick Nov 25 '19 at 20:20
  • So one issue that the pairwise.t.test section of the dplyr code I had, was that I think the function is calculating the t.test across the whole dataframe, and not for each group. I think by calling 'mydataframe$TotalTime, mydataframe$Test_Group', it's referencing the full data frame. I need it to be calculating the t-test for each group, not the entire dataset. – Bjorno Nov 25 '19 at 20:27

1 Answers1

4

Here's how you might do it using dplyr, purrr and broom

library(dply)
library(purrr)
library(broom)
mydataframe %>% 
  nest(data = c(Test_Group, TotalTime)) %>% 
  mutate(tidy=map(data, ~tidy(pairwise.t.test(.$TotalTime, .$Test_Group,
                                   p.adjust.method = "BH")))) %>%
  select(-data) %>% 
  unnest(tidy)

Note since we are using map, we use .$ rather than mydataframe$ to get the current group rather than the original table. See more examples at the broom and dplyr vignette

MrFlick
  • 195,160
  • 17
  • 277
  • 295