R: t.test multiple variables in dataframe with dplyr then summarise in table

Question

Suppose I have this reproducible dataset:

set.seed(949494)
KPI1 <- round(runif(50, 1, 100))
KPI2 <- round(runif(50, 1, 100))
KPI3 <- round(runif(50, 1, 100))
ID <- rep(c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7", "ID8", "ID9", "ID10"), times = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5))
Stimuli <- rep(rep(c("A", "B"), times = c(5, 5)), 5)
AOI <- rep(c("Text", "Picture", "Button", "Product", "Logo"), 5)
DF <- data.frame(ID, Stimuli, AOI, KPI1, KPI2, KPI3)

Is it possible to do t.tests of all KPI columns per AOI between A & B Stimuli with dplyr?

Currently, I am doing this by hand on a much larger dataset which is very time-consuming:

#SUBSET DATAFRAME into A / B DATAFRAMES
DF_A <- subset(DF, Stimuli == "A")
DF_B <- subset(DF, Stimuli == "B")

#SUBSET A / B DATAFRAMES into AOI DATAFRAMES
DF_A_Text <- subset(DF_A, AOI == "Text")
DF_B_Text <- subset(DF_B, AOI == "Text")


#t.test AOIs A vs B
t.test(DF_A_Text$KPI1, DF_B_Text$KPI1)

t.test(DF_A_Text$KPI2, DF_B_Text$KPI2)

t.test(DF_A_Text$KPI3, DF_B_Text$KPI3)

I then repeat these steps for each AOI "Picture" ... "Logo", which is very time consuming. I think it is possible with dyplr... just not able to master the syntax with my specific use case.

Final goal is to then summarize each p-value of the t-tests next to the summaries per Stimuli AvsB (average each KPI(1:3) across all ID(1:10) for each AOI(1:5):

Thankful for any help I can get as I'm an R beginner.

score 1 · Accepted Answer · answered Aug 31 '22 at 18:23

1

I would use the dplyr package for this analysis as follows:

library(dplyr)

DF %>% 
  pivot_longer(starts_with("KP"), names_to = "KP", values_to = "value") %>% 
  group_by(AOI, KP) %>% 
  nest() %>% 
  mutate(
    pval = map_dbl(data, ~t.test(value ~ Stimuli, data = .x)$p.value), 
    mean_a = map_dbl(data, ~mean(.x$value[.x$Stimuli == "A"])), 
    mean_b = map_dbl(data, ~mean(.x$value[.x$Stimuli == "B"]))
  ) %>% 
  select(-data) %>% 
  arrange(KP, AOI)

answered Aug 31 '22 at 18:23

Will Oldham

704
3
13

Thank you! I was able to solve real data NA issues by using 'na.omit() %>%' in the beginning of the pipe. – Smuts94 Sep 05 '22 at 13:06

R: t.test multiple variables in dataframe with dplyr then summarise in table

1 Answers1

Linked