0

I have a panel data set and want to create a matrix similar to a correlation matrix but only with the differences of the t-test estimates as well as the t-statistic.

Using the toothgrowth data, I first subgroup supp ids according to their dose values and I want to calculate the t-statistics for all possible combination between the sub groups.

I want my t-test matrix to look as follows

          VC_all  VC_0.5     VC_1  VC_all    VC_0.5  VC_1  OJ_all  OJ_0.5  OJ_1                                                             

VC_all                                                  -4 ( -1.92 )       
VC_0.5
VC_1
VC_all
VC_0.5
VC_1
OJ_all
OJ_0.5
OJ_1

as an example I filled one value with the following formula

t_test <- t.test(x = filter(ToothGrowth, supp== "VC")$len,
                 y = filter(ToothGrowth, supp== "OJ")$len, var.equal = TRUE)

Is there a faster way to this but calculate all t-stats for every single grouping?

df["VC_all","OJ_all"] <- paste(round(t_test$estimate[1] - t_test$estimate[2]), 
                               "(", round(t_test$statistic,2), ")")
Jj Blevins
  • 355
  • 1
  • 13
  • I had a look at the `ToothGrowth` data. I did not find the names you write (VC_0.5, ...). –  Jul 29 '19 at 16:16
  • both the VC and OJ groups can be further grouped according to their dose values. Which can be 0.5 and 1.0 – Jj Blevins Jul 29 '19 at 16:38

1 Answers1

1

You can use this

# generate data
df <- data.frame(matrix(rnorm(100*3), ncol= 3))
# name data
names(df) <- c("a", "b", "c")

# or to use for your data
df <- name_of_your_dataframe

# make a dataframe for the results
results <- data.frame(matrix(rep(NA, ncol(df)*ncol(df)), ncol= ncol(df)))
# name the results dataframe
names(results) <- names(df)
rownames(results) <- names(df)
# between which columns do we need to run t-tests?
to_estimate <- t(combn(names(df), 2))
# replace upper triangle of the matrix with the results
results[upper.tri(results)] <- apply(to_estimate, 1, function(to_estimate_i){
t_results <- t.test(df[ , to_estimate_i[1]], df[ , to_estimate_i[2]])
out <-  paste0(round(t_results$estimate[1] - t_results$estimate[2], 2), " (", round(t_results$statistic, 2), ")")
})
# copy upper to lower
results[lower.tri(results)] <- results[upper.tri(results)]

All you need to do is to replace df with the name of your dataframe

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • If I would replace df with the name of my dataframe, my dataframe would be replaced after the first line of code, if I am not mistaken? – Jj Blevins Jul 29 '19 at 14:36
  • what is the use of this: `df <- data.frame(matrix(rnorm(100*3), ncol= 3)) # name data names(df) <- c("a", "b", "c")` In the line afterwards you overwrite df. An still, exchanging df with my data.frame would lead to the loss of my data. – Jj Blevins Jul 29 '19 at 14:50
  • @JjBlevins using df <- data.frame(matrix(rnorm(100*3), ncol= 3)) # name data names(df) <- c("a", "b", "c") shows you with some generated data how the code works (try it). No, do not overwrite your datadrame with df, but the other way around (overwrite df with your dataframe as it is in the code). You can simply replace name_of_your_dataframe with the actual name of your dataframe and run the code. But it only works if you have the data in seperate columns. I could do that but I don't have the data (see comment under your question) –  Jul 29 '19 at 16:17
  • 1
    I know realized how to apply your code to my data. It works amazingly well. Thank your very much. Do you know how to insert a line break into paste? – Jj Blevins Jul 29 '19 at 17:19
  • @JjBlevins usually with "\n" but I don't think it works with dataframes. Maybe you can ask it as a new question if it does not work out. Happy that my code was helpful –  Jul 29 '19 at 18:03