1

I have some data that shows a treatment effect for a range of tree species, and I'm performing a one-way anova of RESULT ~ TREATMENT for each species. Using dplyr, I've created a new data frame of treatment means grouped by species, as well as standard errors and deviations. I'd like to add the compact letter display results to that same data frame.

Here's some dummy data to play with

library(dplyr)
library(tidyverse)
library(ggplot2)

# Generate species 
species <- rep (c("Oak", "Elm", "Ash"), each = 10)

# Generate treatments
dose_1 <- rep (c("Ctrl"), 30)
dose_2 <- rep (c ("L"), 30)

# Generate results
result_1 <- c((runif(10, 9, 12)), runif(10, 14, 16), runif(10, 6, 8), (runif(10, 2, 5)), runif(10, 1, 4), runif(10, 2, 4))


# Combine into a sinlge dataframe
  data <- data.frame (SPECIES = rep(species, 2), TREATMENT = c(dose_1, dose_2), RESULT = result_1)

# Consolidate into a new data frame for ggplot and add errors
dat <- data %>%
  group_by(SPECIES, TREATMENT) %>%
  summarise( 
    n=n(),
    mean=mean(RESULT),
    sd=sd(RESULT)
  ) %>%
  mutate( se=sd/sqrt(n))  %>%
  mutate( ic=se * qt((1-0.05)/2 + .5, n-1))

# Plot the results
ggplot(dat, aes(x= reorder(SPECIES, -mean), y = mean, fill = TREATMENT))+
  geom_bar(position = 'dodge', stat = 'identity')+
  geom_errorbar(aes(ymin = mean-se, ymax = mean+se), position = position_dodge(.9), width = 0.2)

enter image description here

Now I'd like to get the compact letter display results from a Tukey test for the treatment effect on each species and add it to dat

# First perform anova tests for treatment effects for each species
df_aov <- data %>%
  dplyr::group_by(SPECIES) %>%
  tidyr::nest() %>%
  dplyr::mutate(.data = .,
                aov_results = data %>% purrr::map(.x = ., .f = ~ summary(aov(RESULT ~ TREATMENT, data = .))))

# Inspect the results
df_aov$aov_results[[1]]
df_aov$aov_results[[2]]
df_aov$aov_results[[3]]

At this point I can only perform the Tukey test on the anova results for each species, like this

# Tukey's test
tukey <- TukeyHSD(df_aov$aov_results[[1]])

# compact letter display
cld <- multcompLetters4(df_aov$aov_results[[1]], tukey)

What I'd like to be able to do is perform the Tukey tests as batch on all the species and add the compact letter display results to the dat data frame that I'm using for ggplot. dat should end up looking something like this at the end

dat

# A tibble: 6 x 8
# Groups:   SPECIES [3]
  SPECIES TREATMENT     n  mean    sd    se    ic   cld
  <chr>   <chr>     <int> <dbl> <dbl> <dbl> <dbl>  <chr>
1 Ash     Ctrl         10  7.17 0.556 0.176 0.398    a
2 Ash     L            10  2.78 0.454 0.143 0.324    b
3 Elm     Ctrl         10 15.0  0.653 0.206 0.467    a
4 Elm     L            10  2.52 0.468 0.148 0.335    b
5 Oak     Ctrl         10 10.5  1.13  0.357 0.808    a
6 Oak     L            10  3.66 0.895 0.283 0.640    b

I'd then like to add a line of code to the ggplot like this, so I can add the compact letter display results to each species above the error bars.

 [first part of the plot code] + 
        geom_text(aes(label = cld, y = mean + se), vjust = -0.5, position = position_dodge(0.9),size = 3)
Yasin Br
  • 1,675
  • 1
  • 6
  • 16
A.Benson
  • 465
  • 1
  • 6
  • 16

2 Answers2

3

Using emmeans and multcomp you can obtain CLD quite easily and achieve what you are looking for by a combination of nesting and unnesting your data frame:

library(dplyr)
library(ggplot2)
library(emmeans)
library(multcomp)
library(tidyr)

# Generate species 
species <- rep (c("Oak", "Elm", "Ash"), each = 10)

# Generate treatments
dose_1 <- rep (c("Ctrl"), 30)
dose_2 <- rep (c ("L"), 30)

# Generate results
result_1 <- c((runif(10, 9, 12)), runif(10, 14, 16), runif(10, 6, 8), (runif(10, 2, 5)), runif(10, 1, 4), runif(10, 2, 4))


# Combine into a sinlge dataframe
data <- data.frame (species = rep(species, 2), treatment = c(dose_1, dose_2), result = result_1)

df_aov <- data %>%
  dplyr::group_by(species) %>%
  tidyr::nest() %>%
  rowwise() %>% 
  dplyr::mutate(aov_results = list(aov(result ~ treatment, data = data)), 
         emm = list(emmeans::emmeans(aov_results, "treatment", type= "response")), 
         cld = list(multcomp::cld(emm, Letters = LETTERS, reverse = TRUE))) %>% 
  dplyr::select(-data, -aov_results, -emm) %>% 
  unnest(cols = c(cld)) %>%
  dplyr::mutate(cld = trimws(.group)) %>% 
  dplyr::select(-.group)
  
 ggplot(df_aov, aes(x= reorder(species, -emmean), y = emmean, fill = treatment))+
  geom_bar(position = 'dodge', stat = 'identity')+
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), position = position_dodge(.9), width = 0.2) + 
  geom_text(aes(label = cld, y = upper.CL), vjust = -0.5, position = position_dodge(0.9),size = 3)
  

enter image description here

Stuart Demmer
  • 196
  • 1
  • 5
  • Hi Stuart - this solution looks clean, but I'm getting an error message after creating df_aov - `Error in select(., -group) : unused argument (-group)`. That seems to carry over to the ggplot code too - `Error in tapply(X = X, INDEX = x, FUN = FUN, ...) : object 'emmean' not found` Any suggestions to resolve this please? – A.Benson May 17 '22 at 21:28
  • Sorry about that - silly typo. `cld()` returns the CLDs as a column called `.group` which I changed to `cld` and then dropped the `group` column. I should have dropped `.group` - code has been edited. – Stuart Demmer May 17 '22 at 22:03
  • Hi Stuart. I'm still getting the error message, (`Error in select(., -.group) : unused argument (-.group)`) and your code is a little beyond my expertise to fix it up. I really want to use this solution as it's all very tidy. Would you mind taking another look please? – A.Benson May 17 '22 at 23:10
  • Sorry - it looks like it might be that you have another package loaded which also has a `select` function. Try these edits - hope it works this time. – Stuart Demmer May 18 '22 at 04:51
  • 1
    Bingo!! - Thanks very much. I thought it may have been something like that, but darned if I could work out which one. – A.Benson May 18 '22 at 04:58
-1

starting with dataframe data of your example, you could do the following:

  1. create dataframe with descriptive stats (as in your example)
library(multcompView)
library(dplyr)

df_summary <-
  data %>%
  group_by(SPECIES, TREATMENT) %>%
  summarise(n = n(),
            sd = sd(RESULT, na.rm = TRUE),
            mean=mean(RESULT)
            ) %>%
  mutate(se = sd/sqrt(n),
         ic = se * qt((1-0.05)/2 + .5, n-1)
         )
  1. create dataframe with Tukey stats (note the use of rowwise):
df_tukey <- 
  data %>%
  group_by(SPECIES) %>%
  nest %>%
  rowwise %>% ## !important
  mutate(aov_result = list(aov(RESULT ~ TREATMENT, data = data)),
         tukey = list(TukeyHSD(aov_result)),
         cld = list(multcompLetters4(aov_result, tukey)$TREATMENT$Letters )
         ) %>%
  unnest(cld) %>%
  select(SPECIES, LETTER = cld) %>%
  mutate(TREATMENT = names(LETTER))
  1. join both:
df_summary %>%
  left_join(df_tukey)
  • I get the following error after the join command - `Error in left_join(., df_tukey) : object 'df_summary' not found`. It can be resolved and the desired plot can be achieved with the following - `df_summary <- left_join(df_tukey, dat)`. If you'd like to edit your post I can mark the question as answered. Thanks very much – A.Benson May 17 '22 at 21:44
  • You did create **df_summary** as in step 1 of steps 1-3, didn't you? ;-) –  May 17 '22 at 22:01
  • I did, but I called it `dat`. This works - `df_summary %>% left_join(df_tukey)`, but in order to add the CLD letters to the `ggplot` - the desired result - the joined data need to be assigned to something, which can either be done as I've shown above, or with this `df_summary <-` and then your code. Your solution works, I just needed to play with it a little to get things onto the plot. Thanks very much. – A.Benson May 17 '22 at 22:08