I want to run t-tests by gender within groups. I have two group variables (group_1 and group_2) and multiple outcome variables (var1 and var2 - though in my dataset I have many variables).
#Packages
library(dplyr)
library(reshape2)
library(rstatix)
##Dataset
group_1 <-c(rep("Group X", 40), rep("Group Y", 40),
rep("Group Z", 60), rep("Group Y", 20),
rep("Group Z", 50), rep("Group Y", 10))
group_2 <- c(rep("A", 100), rep("B", 20), rep("C", 50), rep("A", 20), rep("B", 30))
var1 <- rnorm(n=220, mean=0, sd=1)
var2 <- rnorm(n = 220, mean = 1, sd=1.3)
gender <- c(rep("M", 30), rep("F", 30), rep("M", 40) , rep("F", 50), rep("M", 20),
rep("F", 20), rep("M", 30))
data <- as.data.frame(cbind(group_1, group_2, var1, var2, gender))
##Groupings
table(data$group_1, data$group_2, data$gender)
#Long format
g_long <- gather(data, variable, value, var1:var2)
g_long$value <- as.numeric(g_long$value)
#T-tests for each variable within groups
g_test <- g_long %>%
group_by(variable, group_1, group_2) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)
The above code gives me the error below:
Error: Problem with `mutate()` input `data`.
x not enough 'y' observations
i Input `data` is `map(.data$data, .f, ...)`.
This code does work with only one group, or if I remove the right data:
#this works
g_test <- g_long %>%
group_by(variable, group_1) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)
#Manually remove category where I cannot calculate gender diff - this works
g_long1 <- g_long[!(g_long$group_1 == "Group Y" & g_long$group_2 == "B"),]
g_test <- g_long1 %>%
group_by(variable, group_1, group_2) %>%
t_test(value ~ gender, p.adjust.method = "holm", paired=FALSE)
There are no women in the group Y & group B category, so the code works if I manually remove them. I tried something like the below to automatically detect and remove these categories, but it doesn't help because it can't delete the data if there are either no men, or no women per category.
g_long<- g_long %>%
group_by(group_1, group_2, variable, gender) %>%
filter(n() >= 5)
How can I automatically remove categories for which I cannot run t-tests? I have more than 3 categories for each group in my dataset, so manually selecting each group would be difficult.