I suspect that MattParker's comment is going to be the biggest thing here: you are comparing a single number with a vector, and t.test
will complain about that. Since you suggested that you want to perform tests per grouping variable (id
), so in base R you probably want to use a function like by
(or split
). (There are great methods within dplyr
and data.table
as well.)
Using mtcars
as sample data, I'll try to mimic your data:
dat <- mtcars[c("cyl", "mpg")]
colnames(dat) <- c("id", "ratio")
It isn't clear what you mean to use for dist
, so I'll use the naïve
dist <- 1:10
Now you can do:
by(dat$ratio, dat$id, function(x) t.test(x, dist, paired = FALSE)$p.value)
# dat$id: 4
# [1] 2.660716e-10
# ------------------------------------------------------------
# dat$id: 6
# [1] 4.826322e-09
# ------------------------------------------------------------
# dat$id: 8
# [1] 2.367184e-07
If you want/need to deal with more than just ratio
at a time, you can alternatively do this:
by(dat, dat$id, function(x) t.test(x$ratio, dist, paired = FALSE)$p.value)
# dat$id: 4
# [1] 2.660716e-10
# ------------------------------------------------------------
# dat$id: 6
# [1] 4.826322e-09
# ------------------------------------------------------------
# dat$id: 8
# [1] 2.367184e-07
The results from the call to by
are a class "by"
, which is really just a repackaged list
with some extra attributes:
res <- by(dat, dat$id, function(x) t.test(x$ratio, dist, paired = FALSE)$p.value)
class(res)
# [1] "by"
str(attributes(res))
# List of 4
# $ dim : int 3
# $ dimnames:List of 1
# ..$ dat$id: chr [1:3] "4" "6" "8"
# $ call : language by.data.frame(data = dat, INDICES = dat$id, FUN = function(x) t.test(x$ratio, dist, paired = FALSE)$p.value)
# $ class : chr "by"
So you can expand/access it however you would a list
:
res[[1]]
# [1] 2.660716e-10
as.numeric(res)
# [1] 2.660716e-10 4.826322e-09 2.367184e-07
names(res)
# [1] "4" "6" "8"
(Realize that the different levels of dat$id
are the integers 4, 6, and 8, so the names
should correspond to your $id
.)
Edit:
If you want the results in a data.frame, two options come to mind:
- Repeat the p-value for each and every row, resulting in a lot of duplication. I discourage this method for several reasons; if you need it at some point, I suggest using option 2 and then
merge
.
Produce a data.frame with as many rows as unique id
. Something like:
do.call(rbind.data.frame,
by(dat, dat$id, function(x) list(id=x$id[1], pv=t.test(x, dist, paired=F)$p.value)))
# id pv
# 4 4 1.319941e-03
# 6 6 2.877065e-03
# 8 8 6.670216e-05