I have two data.frames
(code below). The row ids in the data.frames
match.
set.seed(12345)
df1 <- data.frame(id=c(letters),
g1=c(rep(0,7), rep(1,18), "NA"),
g2=c(rep("NA", 3), rep(0,23)),
g3=c(rep(0,5), rep("NA",20),1),
g4=c(sample(c(0,1), replace=TRUE, size=26)),
g5=c(sample(c(0,1), replace=TRUE, size=20), "NA", sample(c(0,1), replace=TRUE, size=5)),
g6=c(rep(1,26)),
g7=c(sample(c(0,1), replace=TRUE, size=26)),
g8=c(rep(0,5), rep("NA",21)),
g9=c(rep("NA",26)),
g10=c(rep(0,26)))
df1[,2:11] <- sapply(df1[,2:11], as.numeric)
df2 <- data.frame(id=c(letters),
b1=c(runif(7, 3, 7.8), "NA", runif(18, 6, 18)),
b2=c(runif(7, -1, 4), "NA", runif(18, 0, 5)),
b3=c(runif(20, 0, 16), "NA", runif(5, -1, 2)),
b4=c(runif(7, 5, 29), rep("NA", 3), runif(3, -1, 2)),
b5=c(runif(3, 3, 8), rep("NA",23)),
b6=c(rep("NA",21), runif(5, 0, 19)),
b7=c(rep("NA",26)),
b8=c(runif(26, 1, 9)),
b9=c(runif(7, -1, 4), "NA", runif(18, -1, 4)),
b10=c(runif(6, -1, 4), rep("NA", 2), runif(18, 0, 5)),
b11=c(runif(7, 18, 23), "NA", runif(18, 12, 19)),
b12=c(runif(7, 0, 4), "NA", runif(18, 0, 4)),
b13=c(runif(7, 1, 4), "NA", runif(14, -2, 18), rep("NA", 4)),
b14=c(runif(6, 6, 8), rep("NA", 3), runif(17, 0, 5)),
b15=c(runif(7, 11, 12), "NA", runif(18, -1, 5)),
b16=c(runif(7, 3, 4), "NA", runif(18, 12, 21)),
b17=c(rep("NA", 8), runif(16, 1, 8), rep("NA", 2)))
df2[,2:18] <- sapply(df2[,2:18], as.numeric)
I would like to use t tests to test if for each "g" column in df1
, the 0 and 1 groups have significantly different values in df2
.
For example, for b1:
t.test(df2$b1[1:8], df2$b1[9:26])$p.val
#[1] 3.846501e-07
I would like to create a new df3
with the results, which looks like this:
df3 <- data.frame(g=rep("g1", 3),
b=c("b1", "b2", "b3"),
mean_0=c(mean(na.omit(df2$b1[1:8])),0,0),
mean_1=c(mean(na.omit(df2$b1[9:26])),0,0),
p.val=c(t.test(df2$b1[1:8], df2$b1[9:26])$p.val,1,1),
p.adjust=c(0,1,1))
df3 <- df3[order(df3$p.val),]
I do not know how to code this difficult problem. Can anyone help?