0

I have a data.frame, which is similar to this one:

cb <- data.frame(group = ("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

Structure of the actual dataframe:

str(cb)
data.frame: 66936 obs of 89 variables: 
$group: Factor w/ 5 levels "A", "B", "C" ...
$WC: int 19 28 35 92 10 23...
$Ana: num 17.2 48 35.4 84.2
$ Clo: num 37.2 12.1 45.4 38.9
....

mean <- colMeans(cb[,2:89])
mean
WC     Ana    Clo    ...
52.45  37.23  50.12  ...

I want to perform One Sample t.tests on every group and every variable

For that I did the following:

A <- subset(cb, cb$group == "A")
B <- subset(cb, cb$group == "B")
...

t_A_WC <- t.test(A$WC, mu = mean[1], alternative = "two.sided")
t_B_WC <- t.test(B$WC, mu = mean[1], alternative = "two.sided")
....

t_A_Ana <- t.test(A$Ana, mu = mean[2], alternative = "two.sided")
t_B_Ana <- t.test(B$Ana, mu = mean[2], alternative = "two.sided")
....

t_A_Clo <- t.test(A$Clo, mu = mean[3], alternative = "two.sided")
t_B_Clo <- t.test(B$Clo, mu = mean[3], alternative = "two.sided")
....

The results are correct (or seem to be), but it is very time consuming typing the whole thing so many times.

Is there a smarter way to do that?

What I have tried:

From here

results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]

But the results are somehow very wrong and does not fit to the values i calculated priorly.

EDIT:

Here is a link to a 10.000 row sample from the actual dataset

Community
  • 1
  • 1
Arthur Pennt
  • 155
  • 1
  • 14
  • do you want to get all the values of a t-test result (i.e. store the list (of results) in a list), or just certain statistics (e.g. store p-value of each test in a vector)? – carlo Jul 28 '16 at 14:29
  • i need all the values. That is t-value, parameter, p.value etc – Arthur Pennt Jul 28 '16 at 14:31
  • i have posted an idea below that captures all results. I can't test the correctness of the ouput though since there isn't any. I hope that helps. – carlo Jul 28 '16 at 14:48
  • I added a link to the dataset. Maybe that makes things clearer... – Arthur Pennt Jul 28 '16 at 15:55

2 Answers2

1

this approach might be kind of lengthy. but i think it captures all the combination that you are looking for ("A" with "WC", "Ana", "Clo", "B" with "WC", "Ana", "Clo", etc.) So all in all 5 groups*3 variables = 15 t-test results.

cb <- data.frame(group = c("A", "B", "C", "D", "E"), WC = runif(100, 0, 100), Ana = runif(100, 0, 100), Clo = runif(100, 0, 100))

mean <- colMeans(cb[,2:4])
varNames <- names(cb)[-1]   # removing group variable from list of variables


# t-test results are stored in a list of list
master <- list()
i <- 1

  ## main for loop subsets; lapply calculates t-statistics for all variables in the subset
  for (group in unique(cb$group)){
    # create a list of t-test result in a given "group" subset
        results <- lapply((1:length(varNames)), FUN = function(x, subset = cb[cb$group == group,]) {
      t.test(subset[varNames[x]], mu = mean[x], alternative = "two.sided")
    })


    master[[group]] <- results
    i <- i + 1
  }

# so for example, if you want to find the results from group "A" and "WC" you say
master[["A"]][[1]]   # index one becaise "WC" is the first element of varNames

#   One Sample t-test
# 
# data:  subset[varNames[x]]
# t = -0.417, df = 19, p-value = 0.6813
# alternative hypothesis: true mean is not equal to 46.5857
# 95 percent confidence interval:
#  30.27709 57.47510
# sample estimates:
# mean of x 
#  43.87609 

# from there you can just find your relevant statistic, for example

master[["A"]][[1]]$statistic   # gives the t-statistic (eg. $statistic, $p.value, etc.)

#         t 
# -0.4170353
carlo
  • 131
  • 6
  • i got the Error: unexpected symbol in " for (group in unique(cb$group)){subset(results <- lapply((1:length(varNames)), FUN = function(x, subset = cb[cb$group == group,]) {t.test(subset[varNames[x]], mu = mean[x], alternative =". Do you know what i can do to fix this? – Arthur Pennt Jul 28 '16 at 15:44
  • @8bytez, I think there was an error in copying the code. there is an extra 'subset' in the Error message that you pasted. – carlo Jul 28 '16 at 19:19
1

First, let's initialise a results matrix and group levels.

res <- matrix(NA, ncol=5, 
    dimnames=list(NULL, c("group", "col", "statistic", "estimate", "p.value")))
gr <- levels(cb$group)

Then we loop through all columns for which to calculate the t.test, subsetting each for every available group.

for(cl in 2:ncol(cb)){
    for(grp in gr){
        temp <- cb[cb$group == grp, cl]
        res <- rbind(res, c(grp, colnames(cb)[cl], 
            unlist(t.test(temp, mu = mean(cb[,cl]), alternative="two.sided"))[c(1, 5, 3)]))
    }
}

And finally, we reformat the results table.

res <- data.frame(res[-1,])
nya
  • 2,138
  • 15
  • 29
  • I got the error message: ´Errror: unexpected symbol in "forfor(cl in 2:ncol(cb)){for(grp in gr){temp <- cb[cb$group == grp, cl] res"´. Do you know what that means? – Arthur Pennt Jul 28 '16 at 15:39
  • @8bytez It's a typo. Here, you have forfor instead of `for` as far as I can see. – nya Jul 28 '16 at 15:49
  • oh yeah. but the typo is just here, not in my Rstudio. So the problem remains...:7 – Arthur Pennt Jul 28 '16 at 15:51
  • I added a link to the actual dataset. Maybe that makes thing clearer... – Arthur Pennt Jul 28 '16 at 15:55
  • 1
    @8bytez Okay. The error could happen when one does not close brackets. It seems to be okay in the snippet you posted. The code works for me with copy and paste from here. Perhaps try copying it one row at a time. – nya Jul 28 '16 at 15:55
  • 1
    Ah, it retried it. It seemed to be a typo. It worked now. Thank you for your great work!! – Arthur Pennt Jul 28 '16 at 16:00