0

I have three Samples (replicates) per Group. I want to use a T-test to compare values (MappedReadsCPM) between groups. However, I have 4000 values to compare sequentially (designated by PeakNumber). The following line is close, but it isn’t telling R to compare only peak_1, and then only peak_2, etc.

    t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2")))$p.value

I don’t want to print the 4000 p-values - ideally I can add them to a dataframe.

    pvalues <- t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2")))$p.value

data2

PeakNumber Sample   Group   MappedReadsCPM
peak_1  A   1   43.53819
peak_2  A   1   49.20722
peak_3  A   1   38.54943
peak_4  A   1   99.09472
peak_1  B   2   105.21728
peak_2  B   2   42.63114
peak_3  B   2   78.00591
peak_4  B   2   74.37773
peak_1  C   2   509.30606
peak_2  C   2   101.36234
peak_3  C   2   25.17051
peak_4  C   2   32.8804
peak_1  D   1   35.37478
peak_2  D   1   89.11722
peak_3  D   1   112.24688
peak_4  D   1   386.40139
peak_1  E   3   631.07692
peak_2  E   3   162.58791
peak_3  E   3   46.93961
peak_4  E   3   56.69035
peak_1  F   2   38.7762
peak_2  F   2   261.45587
peak_3  F   2   43.99171
peak_4  F   2   72.11012
peak_1  G   1   118.5962
peak_2  G   1   250.1178
peak_3  G   1   84.35
peak_4  G   1   386.40139
JVGen
  • 401
  • 3
  • 10
  • For the t tests to make sense you must get rid of `Group == 3`. When you split the data by `PeakNumber` the groups 1, 2 have the same number of rows but group 3 only has 1 datum and the tests cannot be run. – Rui Barradas Jan 15 '20 at 19:15

2 Answers2

1

you can use sapply to loop over al the unique peaks in your data and subset the data to that specific peak:

pvalues <- sapply(unique(data2$PeakNumber), function(peak){
  t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2") & PeakNumber == peak))$p.value
})
GordonShumway
  • 1,980
  • 13
  • 19
  • This is running, but it looks like it is saving the pvalues in a list. I assume the pvalues maintained order - e.g. the 1st pvalue is from peak_1, 2nd is peak_2? Easiest way to save this to a new df with PeakNumber and pvalue as columns? – JVGen Jan 15 '20 at 20:03
  • I think I got this with the following, so long as that pvalue output list is in the correct order. peak_table <- distinct(data2, PeakNumber) pvalue_table <- peak_table %>% dplyr::mutate(pvalue = pvalues) – JVGen Jan 15 '20 at 20:11
0

In your data it seems that t tests cannot be run for Group == 3. So I start by subsetting the data to keep only groups 1 and 2.

df_12 <- subset(df1, Group != 3)

Now split by PeakNumber and then lapply the tests. The output is a list of test results.

sp <- split(df_12, df_12$PeakNumber)

t_list <- lapply(sp, function(DF){
  t.test(MappedReadsCPM ~ Group, data = DF)
})

This extracts the p-values from the results above.

pvals <- sapply(t_list, '[[', 'p.value')

pvals
#   peak_1    peak_2    peak_3    peak_4 
#0.4105493 0.9526529 0.3357703 0.1348856 

Final clean up.

rm(df_12, sp)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66