2

I want to create a function (or a way to perform the test in one-go) that does Mann Whitney test. I want to analyze logSG values between 2 different CC conditions within the same Time. So for following dataframe, I want 3 p-values that correspond to each Time.

My sample dataframe:

structure(list(Time = c("30", "30", "30", "30", "30", "30", "30", 
"30", "30", "30", "30", "30", "30", "60", "60", "60", "60", "60", 
"60", "60", "60", "60", "60", "90", "90", "90", "90", "90", "90", 
"90", "90", "90"), CC = c("Scramble", "Scramble", "Scramble", 
"Scramble", "Scramble", "Scramble", "Scramble", "Scramble", "KD", 
"KD", "KD", "KD", "KD", "Scramble", "Scramble", "Scramble", "Scramble", 
"Scramble", "KD", "KD", "KD", "KD", "KD", "Scramble", "Scramble", 
"Scramble", "Scramble", "KD", "KD", "KD", "KD", "KD"), logSG = c(0, 
6.29469069760774, 6.97548510669835, 0, 0, 5.6529880324294, 0, 
0, 0, 0, 0, 5.84818081635987, 0, 6.33960454566506, 0.410736902037262, 
0, 0, 0, 0, 0.0294484401648161, 0, 1.03061195077248, -1.30321174424293, 
-1.25902114646857, 0, 0, 0.787059500696643, 3.54611686297603, 
0, 0, -0.297732408305282, 0)), row.names = c(NA, -32L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7f9b120204e0>)

I tried following for every time point:

e <- result[result$Time == 30,]
wilcox.test(SG ~ CC, data=e)

This is clunky and inefficient.

Or, I'm having trouble getting this to work:

t <- result %>% group_by(Time) %>% do(te=wilcox.test(logSG ~ CC))

If possible, I'd like to learn how to do this using both dplyr and m/s/apply.

References: link Link

CeC
  • 85
  • 10

1 Answers1

3

If we are using do, then specify the data

library(dplyr)
result %>%
     group_by(Time) %>%
     do(te=wilcox.test(logSG ~ CC, data = .)) 

Or using map on nested dataset

library(purrr)
result %>%
   group_by(Time) %>%
   nest %>%
   mutate(te = map(data, ~ wilcox.test(logSG ~ CC, data = .x) ))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you so much! Do you think 'dplyr' is a better way to go rather than using 'sapply' or something similar? – CeC Jan 27 '20 at 00:38
  • 1
    @CeC As it is a group by operation, the `tidyverse` approach would be more easier to understand. But you can also do this in `base R` after `split`ting i.e. `lapply(split(result, result$Time), function(dat) wilcox.test(logSG ~ CC, data = dat))` – akrun Jan 27 '20 at 00:44
  • 1
    I didn't think to use ```split```. Thank you for your help! – CeC Jan 27 '20 at 01:02