1

I have 58 columns in each data frame that I would like to compare to see if there is a significant difference between them (individually and as a whole) as each of the 58 comprise a water basin and would be a sum of the whole, but still individually represent different things. I am not sure how to run a t.test on this. I am really new to coding and to R

Phil
  • 7,287
  • 3
  • 36
  • 66
LARA
  • 11
  • 1

2 Answers2

0

In most simplistic case, you would loop through each column and do multiple t-test, one such example shown below.

# Dataframe 1: Col 1: It has 100 values, mean = 1, SD = 1
df_1_col_1 = rnorm(100, 1, 1)

# Dataframe 2: Col 1: It has 75 values, mean = 2, SD = 1
df_2_col_1 = rnorm(75, 2, 1)

# Null hyposthesis: difference between x and y is = 0
t.test(df_1_col_1, df_2_col_1)

# P-value < 0.05 you reject the null hypothesis.

Or, you can row-wise aggregate the 58 columns to get one value for each row. Ex: take mean of 58 column values. Now you will get a list of values(df_1_col_1 & df_2_col_1 in above code) for dataframe 1 and dataframe 2. If you don't like simple mean, you can do PCA on your dataframes and use 1st principal component from both the dataframes, to do a t-test.

Aman J
  • 1,825
  • 1
  • 16
  • 30
0

Here is a way of conducting t-tests on all colimns of two data.frames using a lapply loop. Each of the tests returns a list of class "htest", and the sapply instructions extract the list members of interest.

tests_list <- lapply(seq_along(df1), function(i){
  t.test(df1[[i]], df2[[i]])
})

sapply(tests_list, '[[', 'statistic')
sapply(tests_list, '[[', 'p.value')
sapply(tests_list, '[[', 'conf.int')

Test data

set.seed(2021)
n <- 20
df1 <- matrix(rnorm(n*4), ncol = 4)
df2 <- matrix(rnorm(n*4), ncol = 4)
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66