0

My dataset looks something like this:

df <- data.frame(compound = c("alanine ", "arginine", "asparagine", "aspartate"))
df <- matrix(rnorm(12*4), ncol = 12)
colnames(df) <- c("AC-1", "AC-2", "AC-3", "AM-1", "AM-2", "AM-3", "SC-1", "SC-2", "SC-3", "SM-1", "SM-2", "SM-3")
df <- data.frame(compound = c("alanine ", "arginine", "asparagine", "aspartate"), df)
 df
compound        AC.1        AC.2       AC.3       AM.1       AM.2            AM.3       SC.1       SC.2       SC.3         SM.1
1   alanine   1.18362683 -2.03779314 -0.7217692 -1.7569264 -0.8381042      0.06866567  0.2327702 -1.1558879  1.2077454  0.437707310
2   arginine -0.19610110  0.05361113  0.6478384 -0.1768597  0.5905398     -0.67945600 -0.2221109  1.4032349  0.2387620  0.598236199
3 asparagine  0.02540509  0.47880021 -0.1395198  0.8394257  1.9046667      0.31175358 -0.5626059  0.3596091 -1.0963363 -1.004673116
4  aspartate -1.36397906  0.91380826  2.0630076 -0.6817453 -0.2713498     -2.01074098  1.4619707 -0.7257269  0.2851122 -0.007027878

I want to perform a t-test for each row (compound) on the columns [2:4] as one, and [5:7] as one, and store all the p-values. Basically see if there is a difference between the AC group and AM group for each compound.

I am aware there is another topic with this however I couldn't find a viable solution for my problem.

PS. my real dataset has about 35000 rows (maybe it needs a different solution than only 4 rows)

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89

2 Answers2

1

After selecting the columns of interest, use pmap to apply the t.test on each row by selecting the first 3 and next 3 observations as input to t.test and bind the extracted 'p value' as another column in the original data

library(tidyverse)
df %>% 
  select(AC.1:AM.3) %>%
  pmap_dbl(~ c(...) %>%
               {t.test(.[1:3], .[4:6])$p.value}) %>% 
  bind_cols(df, pval_AC_AM = .)

Or after selecting the columns, do a gather to convert to 'long' format, spread, apply the t.test in summarise and join with the original data

df %>%
  select(compound, AC.1:AM.3) %>% 
  gather(key, val, -compound) %>% 
  separate(key, into = c('key1', 'key2')) %>%
  spread(key1, val) %>% 
  group_by(compound) %>%
  summarise(pval_AC_AM = t.test(AC, AM)$p.value) %>% 
  right_join(df)

Update

If there are cases where there is only a unique value, then t.test shows error. One option is to run the t.test and get NA for those cases. This can be done with possibly

posttest <- possibly(function(x, y) t.test(x, y)$p.value, otherwise = NA)
df %>% 
  select(AC.1:AM.3) %>%
  pmap_dbl(~ c(...) %>%
               {posttest(.[1:3], .[4:6])}) %>% 
  bind_cols(df, pval_AC_AM = .)

posttest(rep(3,5), rep(1, 5))
#[1] NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • i get an error with the first one Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "c('matrix', 'double', 'numeric')" – Nathan Weesie Jul 05 '18 at 17:13
  • @NathanWeesie You showed the data as `data.frame` while the error suggest you have a `matrix`. Please check the `str(df)` of your original data. If it is a matrix, convert to data.frame i.e. `as.data.frame(df)` – akrun Jul 05 '18 at 17:14
  • aah you're right! now i get this error though Error in t.test.default(.[1:3], .[4:6]) : data are essentially constant – Nathan Weesie Jul 05 '18 at 17:18
  • @NathanWeesie That is the problem in your data, i.e. `t.test(rep(3, 5), rep(1, 5))` You may need a condition to check that or use `possibly` – akrun Jul 05 '18 at 17:19
  • @NathanWeesie Please check the update. I think it should work now – akrun Jul 05 '18 at 17:24
  • it only prints the values in my df. – Nathan Weesie Jul 05 '18 at 17:46
0

If you can use an external library:

library(matrixTests)

row_t_welch(df[,2:4], df[,5:7])$pvalue
[1] 0.67667626 0.39501003 0.26678161 0.01237438
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89