1

I've the a data frame containing different items (and it's cost) and also it's subsequent groupings. I would like to run an Anova and/or T-Test for each item based on their groupings to see if their mean differs. Anybody knows how to do this in R?

A sample of the dataframe is as follow:

Item Cost Grouping
Book A 7 A
Book A 9 B
Book A 6 A
Book A 7 B
Book B 4 A
Book B 6 B
Book B 5 A
Book B 3 C
Book C 5 C
Book C 4 A
Book C 7 C
Book C 2 B
Book C 2 B
Book D 4 A
Book D 2 C
Book D 9 C
Book D 4 A

The output should be a simple table (or any similar table) as follows

Item P-Value (from ANOVA/t-test) (H0: Mean same for all groupings)
Book A xxx
Book B xxx
Book C xxx
Book D xxx

Thanks in advance!

Luther_Proton
  • 348
  • 1
  • 7

2 Answers2

3

Instead of dealing with multiple ANOVA, t-tests and worrisome (and potentially questionable) p-values, I would fit a single generalised linear mixed-effect model with group as a random effect. This is easy to do in a fully Bayesian way using rstanarm, which gives full posterior distributions for the means of every item. Instead of worrying about the suitability & interpretability of (multiple) hypothesis tests, we can then compare posterior distributions for the means directly.

library(rstanarm)
model <- stan_glmer(cost ~ 0 + item + (1 | group), data = df)

We can summarise the mean posterior distributions by showing the posterior median and 90% posterior uncertainty intervals per item.

library(broom.mixed)
tidy(model, conf.int = TRUE) %>%
    ggplot(aes(y = term)) + 
    geom_point(aes(x = estimate)) + 
    geom_linerange(aes(xmin = conf.low, xmax = conf.high))

enter image description here

Or as a table

tidy(mode, conf.int = TRUE)
## A tibble: 4 × 5
#  term       estimate std.error conf.low conf.high
#  <chr>         <dbl>     <dbl>    <dbl>     <dbl>
#1 itemBook A     7.28      1.17     5.09      9.40
#2 itemBook B     4.44      1.16     2.27      6.45
#3 itemBook C     3.88      1.05     1.89      5.75
#4 itemBook D     4.63      1.21     2.41      6.71

Here,

  • estimate is the posterior median,
  • std.error is the posterior MAD, and
  • conf.low and conf.high are the lower and upper bounds of the 90% posterior uncertainty interval.
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

You could use anova_test from the rstatix package like this:

df <- data.frame(Item = c("Book A", "Book A", "Book A", "Book A", "Book B", "Book B", "Book B", "Book B"),
                 Cost = c(7,9,6,7,4,6,5,3),
                 Grouping = c("A", "B", "A", "B", "A", "B", "A", "C"))

library(dplyr)
library(rstatix)
df %>% 
  group_by(Item) %>%
  anova_test(Cost ~ Grouping)
#> Coefficient covariances computed by hccm()
#> Coefficient covariances computed by hccm()
#> # A tibble: 2 × 8
#>   Item   Effect     DFn   DFd     F     p `p<.05`   ges
#> * <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <chr>   <dbl>
#> 1 Book A Grouping     1     2   1.8 0.312 ""      0.474
#> 2 Book B Grouping     2     1   4.5 0.316 ""      0.9

Created on 2022-07-10 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53