0

Given a matrix m how can I do a t.test on the rows/variable (testing if mean is different from zero) and get a matrix where each column corresponds to e.g. the t.test$statistic and t.test$p.value for the rows. Since some rows have several NAs I at the same time want to make sure the t.test doesnt fail from this; thus in this case the row of the resulting matrix would be NA in both the t.test$statistic and t.test$p.value columns. I thought of something like whats shown below but I cannot get it right. In the end I need to do this on a list of matrices but figure that once I can do it on a single matrix I can use lapply on the list of matrices. Thanks!

res <- apply(m, 1, function(x) {
u <- matrix(NA, nrow = nrow(m), ncol = 4, dimnames = list(
    c(rownames(m)),
    c("Stats", "P-values")
    ))
if(sum(!is.na(x)) > 1)
    u[,1] <- t.test(x)$statistic
    u[,2] <- t.test(x)$p.value
else NA
return(u)
}
)
user3375672
  • 3,728
  • 9
  • 41
  • 70

2 Answers2

0

You can do something along these lines

Step 1: Generate a toy dataset

library(mvtnorm)
set.seed(1)
mat1 <- rmvnorm(n = 30, mean = sample(c(rep(0, 5), 1:5)), sigma = diag(10))
mat1[sample(seq(nrow(mat1) * ncol(mat1)), 5)] <- NA
mat1 <- t(mat1)
mat2 <- rmvnorm(n = 30, mean = sample(c(rep(0, 5), 1:5)), sigma = diag(10))
mat2[sample(seq(nrow(mat2) * ncol(mat2)), 5)] <- NA
mat2 <- t(mat2)
mat_list <- list(mat1, mat2)

Step 2: Create a helper function to conduct the test and produce NA if any missing value

t_test <- function(x)
    c(stat = ifelse(any(is.na(x)), NA, t.test(x)$statistic),
      p_val = ifelse(any(is.na(x)), NA, t.test(x)$p.value))

Step 3: Apply it to the list of matrix rowwise

lapply(mat_list, function(m) t(apply(m, 1, t_test)))
## [[1]]
##           stat      p_val
##  [1,]  1.02334 3.1461e-01
##  [2,] -0.17025 8.6599e-01
##  [3,] -0.55501 5.8314e-01
##  [4,]       NA         NA
##  [5,]  1.48641 1.4796e-01
##  [6,]       NA         NA
##  [7,] 25.64252 1.7737e-21
##  [8,]       NA         NA
##  [9,] 24.50047 6.2831e-21
## [10,]       NA         NA

## [[2]]
##           stat      p_val
##  [1,]       NA         NA
##  [2,]       NA         NA
##  [3,] -0.44341 6.6076e-01
##  [4,]       NA         NA
##  [5,]  1.28913 2.0754e-01
##  [6,]       NA         NA
##  [7,]  4.86929 3.6477e-05
##  [8,] 16.59708 2.4032e-16
##  [9,]  0.54102 5.9263e-01
## [10,]       NA         NA
dickoa
  • 18,217
  • 3
  • 36
  • 50
  • Thanks dickoa, it is exactly what I needed. Very elegant solution for a R newbie; saved me a lot of time, and learned me a thing or two. – user3375672 Mar 16 '14 at 14:57
0

To use only complete rows is in many cases to strict. For p-values you need at least 2 valid values. I would prefer something like that.

t_test <- function(x){ c(stat = ifelse(sum(!is.na(x))>1, t.test(x)$statistic, NA), p_val = ifelse(sum(!is.na(x))>1, t.test(x)$p.value, NA)) }

You can use the apply function with margin = 1, which means that the function is just row-wise .

apply(matrix, MARGIN=1, FUN=t_test)

cpeikert
  • 31
  • 5