0

I have this:

subset_1 <- degs[degs$pathol=='fibrosis'&degs$value==1,]
mean_1 <- mean(subset_1$logFC)
sd_1 <- sd(subset_1$logFC)
tsub_1 <- t.test(mean_1, sd_1 alternative = c(“two.sided”, “less”, “greater”))

subset_0 <- degs[degs$pathol=='fibrosis'&degs$value==0,]
mean_0 <- mean(subset_0$logFC)
sd_0 <- sd(subset_0$logFC)

ttest <- t.test(mean_1, sd_1)

I am getting this error:

Error in t.test.default(mean_1, sd_1) : not enough 'x' observations

I am trying to make a t.test on the subset_1 and subset_0 LogFC column.

I am not sure how to calculate the t.test, I thought I need put as input the mean and standard deviation that I calculated from a column in the subset tables which has a column for logFC, I took the mean and sd of that... and tried both ways to obtain the t.test. I read a lot of info on this but still am having trouble with the input.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Francesca C
  • 177
  • 1
  • 9
  • 2
    I don't understand, you want to make a t.test between `subset_1$logFC` and `subset_0$logFC` and your code line is `t.test(mean_1, sd_1)`which are just the mean and sd of `subset_1$logFC`. Why not `t.test(subset_1$logFC, subset_0$logFC)` ? – Basti Apr 22 '22 at 10:58
  • Have you read the t.test documentation ? https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test (x :a (non-empty) numeric vector of data values, y:an optional (non-empty) numeric vector of data values.) – Basti Apr 22 '22 at 11:05
  • @Basti yes indeed I have, and I dont understand what it can take as input if it must be a mean and standard deviation. I basically have a column of logFC scores that I am starting out with. – Francesca C Apr 22 '22 at 13:41
  • I did the t.test of the subset 0 and 1 taking just the logFC columns, it does work, but I just dont know if its ...correct? – Francesca C Apr 22 '22 at 13:43
  • 1
    t.test "performs one and two sample t-tests on vectors of data." this means you need to input the entire series of data you want to compare the mean. You seem to confuse the difference between performing a t-test (ie comparing mean of a series a data) and running `t.test` in R (ie given 2 data series, asking if there is a difference of mean between those 2 data series) – Basti Apr 22 '22 at 14:01
  • @Basti, mindblower, so they are two different things, performing a t.test and running a t.test. crap. this is why my searches have been confusing. thank you for answering what I couldn't seem to ask, you definitely hit the nail on the head with that one. – Francesca C Apr 22 '22 at 15:23

1 Answers1

1

You are complicating too much. Use t.test argument subset to keep the rows you want.
Untested, since there are no data in the question.

fibr <- degs$pathol == "fibrosis"
val01 <- degs$value %in% c(0, 1)

ttest <- t.test(logFC ~ value, data = degs, subset = fibr & val01)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • How can I know this is statistically correct/significant though? I read something about getting the mean and SD first... thank you, this worked well. – Francesca C Apr 22 '22 at 13:45
  • @FrancescaC The SD is relevant because the variances of the two groups can be equal or not. R defaults to `var.equal = FALSE`, see the [documentation](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html). What does your subject matter knowledge tells you on that? That's what is really important. – Rui Barradas Apr 22 '22 at 14:16
  • well then my question is what I have above that you wrote, it doesn't include the SD, am I correct? – Francesca C Apr 22 '22 at 15:21
  • @FrancescaC t tests *always* include the SD but the way SD's are computed may vary. Do you have reasons to believe SD's/variances are different? If yes leave R's default, if not use `var.equal=TRUE`. – Rui Barradas Apr 22 '22 at 18:07