I have a dataframe with the following structure:
set.seed(1)
dat<- data.frame(gender=sample(rep(c("Man","Woman"),3000)),
age=sample(rep(c("Young","Old"),3000)),
question=rep(c("Q1", "Q2", "Q3"),2000),
response=rep(c("Res1", "Res2"),3000),
value=sample(rep(c(0,1),3000)))
head(dat)
# gender age question response value
#1 Man Old Q1 Res1 0
#2 Man Young Q2 Res2 1
#3 Man Old Q3 Res1 0
#4 Woman Old Q1 Res2 1
#5 Man Old Q2 Res1 1
#6 Man Old Q3 Res2 1
I have created a loop to do a t-test for every response per question, and join the output in a dataframe.
library(tidyverse)
library(rstatix)
data.list1<- list()
for (i in 1:length(table(dat$question))) {
dat1<- dat %>%
filter(question==names(table(dat$question))[[i]])
data.list2 <- list()
for(f in 1:(ncol(dat1)-3)){
dat2<- dat1 %>%
t_test(reformulate(colnames(dat1)[f], "value"),
detailed=T) %>%
mutate(question=names(table(dat$question))[[i]],
response=names(table(dat$response))[[f]])
data.list2[[f]]<- dat2
}
data.list1[[i]] <- bind_rows(data.list2)
}
final.output<- bind_rows(data.list1) %>%
select(question, response, group1, estimate1,
group2, estimate2,p)
final.output
# question response group1 estimate1 group2 estimate2 p
# <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
#1 Q1 Res1 Man 0.492 Woman 0.494 0.932
#2 Q1 Res2 Old 0.484 Young 0.502 0.418
#3 Q2 Res1 Man 0.500 Woman 0.509 0.687
#4 Q2 Res2 Old 0.489 Young 0.518 0.198
#5 Q3 Res1 Man 0.495 Woman 0.510 0.504
#6 Q3 Res2 Old 0.511 Young 0.494 0.452
My problem is that the dataframe I am actually working with is much larger than the one used in this example and contains more variables, so the loop takes a very long time to run (over 10 minutes). Is there any way to obtain the same output without using a loop?