I have 58 columns in each data frame that I would like to compare to see if there is a significant difference between them (individually and as a whole) as each of the 58 comprise a water basin and would be a sum of the whole, but still individually represent different things. I am not sure how to run a t.test on this. I am really new to coding and to R
Asked
Active
Viewed 304 times
2 Answers
0
In most simplistic case, you would loop through each column and do multiple t-test, one such example shown below.
# Dataframe 1: Col 1: It has 100 values, mean = 1, SD = 1
df_1_col_1 = rnorm(100, 1, 1)
# Dataframe 2: Col 1: It has 75 values, mean = 2, SD = 1
df_2_col_1 = rnorm(75, 2, 1)
# Null hyposthesis: difference between x and y is = 0
t.test(df_1_col_1, df_2_col_1)
# P-value < 0.05 you reject the null hypothesis.
Or, you can row-wise aggregate the 58 columns to get one value for each row. Ex: take mean
of 58 column values. Now you will get a list of values(df_1_col_1
& df_2_col_1
in above code) for dataframe 1
and dataframe 2
. If you don't like simple mean
, you can do PCA
on your dataframes and use 1st principal component from both the dataframes, to do a t-test.

Aman J
- 1,825
- 1
- 16
- 30
0
Here is a way of conducting t-tests on all colimns of two data.frames using a lapply
loop. Each of the tests returns a list of class "htest"
, and the sapply
instructions extract the list members of interest.
tests_list <- lapply(seq_along(df1), function(i){
t.test(df1[[i]], df2[[i]])
})
sapply(tests_list, '[[', 'statistic')
sapply(tests_list, '[[', 'p.value')
sapply(tests_list, '[[', 'conf.int')
Test data
set.seed(2021)
n <- 20
df1 <- matrix(rnorm(n*4), ncol = 4)
df2 <- matrix(rnorm(n*4), ncol = 4)
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)

Rui Barradas
- 70,273
- 8
- 34
- 66