1

I need to run a 2-sample independent t-test, comparing Column1 to Column2. But Column1 is in DataframeA, and Column2 is in DataframeB. How should I do this?

Just in case relevant (feel free to ignore): I am a true beginner. My experience with R so far has been limited to running 2-sample matched t-tests within the same data frame by doing the following:

t.test(response ~ Column1, 
data = (Dataframe1 %>% 
gather(key = "Column1", value = "response", "Column1", "Column2")),
paired = TRUE)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
rrrookie
  • 11
  • 1

2 Answers2

3

TL;DR


t_test_result = t.test(DataframeA$Column1, DataframeB$Column2, paired=TRUE)

Explanation


If the data is paired, I assume that both dataframes will have the same number of observations (same number of rows). You can check this with nrow(DataframeA) == nrow(DataframeB) .

You can think of each column of a dataframe as a vector (an ordered list of values). The way that you have used t.test is by using a formula (y~x), and you were essentially saying: Given the dataframe specified in data, perform a t test to assess the significance in the difference in means of the variable response between the paired groups in Column1.

Another way of thinking about this is by grabbing the data in data and separating it into two vectors: the vector with observations for the first group of Column1, and the one for the second group. Then, for each vector, you compute the mean and stdev and apply the appropriate formula that will give you the t statistic and hence the p value.

Thus, you can just extract those 2 vectors separately and provide them as arguments to the t.test() function. I hope it was beginner-friendly enough ^^ otherwise let me know


EDIT: a few additions (I was going to reply in the comments but realized I did not have space hehe)

Regarding the what @Ashish did in order to turn it into a Welch's test, I'd say it was to set var.equal = FALSE. The paired parameter controls whether the t-test is run on paired samples or not, and since your data frames have unequal number of rows, I'm suspecting the observations are not matched.

As for the Cohen's d effect size, you can check this stats exchange question, from which I copy the code:

For context, m1 and m2 are the group's means (which you can get with n1 = mean(DataframeA$Column1)), s1 and s2 are the standard deviations (s2 = sd(DataframeB$Column2)) and n1 and n2 the sample sizes (n2 = length(DataframeB$Column2))

lx <- n1- 1 # Number of observations in group 1
ly <- n2- 1 # # Number of observations in group 1

md  <- abs(m1-m2)        ## mean difference (numerator)
csd <- lx * s1^2 + ly * s2^2
csd <- csd/(lx + ly)
csd <- sqrt(csd)                     ## common sd computation

cd  <- md/csd                        ## cohen's d
Álvaro
  • 564
  • 5
  • 13
  • The dataframes actually don't have the same number of observations. But their variance is equal. So I'd like to use a student's t-test, rather than Welch's. Any advice on how to do that? (When I change paired=TRUE to instead be paired=FALSE, it runs a Welch's t-test, as did Ashish Baid's suggested approach, below.) Otherwise this was fantastically helpful and informative--thank you! – rrrookie Apr 09 '21 at 13:46
  • Also I'd like to calculate the Cohen's d with respect to the results of the t-test. Any advice? – rrrookie Apr 09 '21 at 14:07
  • I edited the answer to include your new questions ^^ It'd be helpful if you marked it as the answer, to let other people know it has been answered – Álvaro Apr 09 '21 at 21:30
0

This should work for you

res = t.test(DataFrameA$Column1, DataFrameB$Column2, alternative = "two.sided", var.equal = FALSE)
Ashish Baid
  • 513
  • 4
  • 9