How to add (select) p-values onto a multiple grouped boxplot

Question

I want to add select significance marker (preferably with automatic adjustment of spaces) onto grouped boxplots that are created by my ggplot function. This function is structured in a way that accommodates all of my other data, so I'd like to add significance in a way that's compatible to my graphing function.

Here's the sample dataset:

structure(list(Drug = c("Drug", "Drug", "Drug", "Drug", "Drug", 
"Drug", "Drug", "Drug", "Drug", "Drug", "Drug", "Drug", "Drug", 
"Drug", "Drug", "Drug", "Drug", "Drug", "Drug", "Drug", "Drug", 
"Drug", "Drug", "Drug", "Drug", "Drug", "Drug"), Drugcc = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), levels = c("0", "60", 
"250"), class = "factor"), Tat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L), cocaine = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 25L, 25L, 25L, 25L, 25L, 
25L, 25L, 25L, 25L), Size = c(145.723, 546.358, 122.951, 555.149, 
781.476, 721.454, 521.449, 436.9, 1003.216, 322.241, 1045.689, 
596.625, 911.433, 884.375, 882.488, 707.296, 935.018, 830.33, 
588.493, 534.274, 922.811, 749.344, 592.114, 878.144, 899.342, 
821.622, 822.433), T_c = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L), levels = c("0 | 0", "6 | 0", "0 | 25", "6 | 25"
), class = "factor")), row.names = c(NA, -27L), groups = structure(list(
    Tat = c(0L, 0L, 0L, 6L, 6L, 6L, 6L, 6L, 6L), cocaine = c(0L, 
    0L, 0L, 0L, 0L, 0L, 25L, 25L, 25L), Drugcc = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("0", "60", "250"
    ), class = "factor"), .rows = structure(list(1:3, 4:6, 7:9, 
        10:12, 13:15, 16:18, 19:21, 22:24, 25:27), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -9L), .drop = TRUE, class = c("tbl_df", 
"tbl", "data.frame")), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"))

Here's a simplified ggplot function:

graphDrug <- function(.data, response, var1, var2, colorFill, color_fill_manual, legendFill, ytitle){
  ggplot(.data, aes(interaction(.data[[var1]],.data[[var2]], sep=" | "), .data[[response]], fill=factor(.data[[colorFill]])))+
    geom_boxplot(color = "black", alpha = 1, outlier.shape = NA, size=.25, position = position_dodge2(.75, preserve = "single"))+
    geom_point(pch = 21, alpha=.7, position = position_jitterdodge(jitter.width = .25))+
    
    scale_fill_manual(values = color_fill_manual)+

    theme_classic(base_size = par('din')[1] * 5)+
    theme(
      text = element_text(face = "bold"),
      legend.position = "top",
      panel.grid.major.x = element_blank(),
      axis.text=element_text(size=rel(1)),
      axis.text.y = element_text(angle = 90, vjust = 0.5, hjust = 0.5))
}

fig_name <- "Size" # y-axis title

color_fill_manual <- c("#ffffff", "#A6CEE3", "#1F78B4") # white, blue pairs
with(df, graphDrug(df, "Size", "Tat", "cocaine", "Drugcc", color_fill_manual, "Drug", bquote(bold(.(fig_name)~~(μm^2))))) # Bold y-axis title

This is what I've tried without success:

my_comparisons <- list( c("0 | 0.0", "0 | 0.60"), c("0 | 0.0", "0 | 0.250"),
                        c("6 | 0.0", "6 | 0.60"), c("6 | 0.0", "6 | 0.250"),
                        c("6 | 25.0", "6 | 25.60"), c("6 | 25.0", "6 | 25.250"))

p <- with(df, graphDrug(df, "Size", "Tat", "cocaine", "Drugcc", color_fill_manual, "Drug", bquote(bold(.(fig_name)~~(μm^2)))))
p+stat_compare_means(comparisons = my_comparisons, method = "wilcox.test")

This is closest that I came to my desired result, but it's still not what I want:

library(ggpubr)
# Box plots
bxp <- ggboxplot(
  df, x = "T_c", y = "Size", fill = "Drugcc",
  palette = (values = c(color_fill_manual))
  )
bxp

stat.test <- df %>%
  group_by(T_c) %>%
  pairwise_wilcox_test(Size ~ Drugcc, ref.group = "0", p.adjust.method = "BH")
stat.test

# Box plots with p-values
stat.test <- stat.test %>%
  add_xy_position(x = "T_c", dodge = 0.8)

bxp +
  stat_pvalue_manual(
    stat.test, label = "p.adj.signif", tip.length = 0.01,
    bracket.nudge.y = -2
    )

Here's the desired result. Although the drawing is crude, I hope you get the effect:

Note about the real dataset:

There are many different Drugs.
The x-axis can be more than 3 interaction groups.
This code and this question are close to what I want, but I can't get them to work.

Thank you for your help.

score 1 · Answer 1 · answered Mar 28 '23 at 14:32

Have you considered creating a new grouping variable as a combination of Groupcc and T_c?

In that way it would be easier to specify the desired comparisons.

library(ggplot2)
library(ggpubr)

# using your data as df

# create new grouping variable
df$group=paste(df$T_c, df$Drugcc, sep = " - ")
df$group=factor(df$group, levels = unique(df$group))

# create the list of comparisons
levels=levels(df$group)
my_comparisons <- list(c(levels[1], levels[2]),
                       c(levels[1], levels[3]),
                       c(levels[4], levels[6]),
                       c(levels[7], levels[8]))

# basic plot
ggplot(df, aes(x=group, y=Size, fill=as.factor(Drugcc))) + geom_boxplot() + 
  scale_fill_manual(values=c("#ffffff", "#A6CEE3", "#1F78B4"))+
  stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif") +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1))+
  scale_x_discrete(labels=c("", "0|0", "","", "6|0","","", "6|25", ""))+
  theme(legend.position="top")+
  labs(fill="Drug (uM)")

Hope it helps!

Thank you. 3 questions. 1) Is this the only way? I have 20 more dataframes that are to be graphed and creating a new group is something that I want to avoid because I have to manually set the group level order. 2) Is there anyway to make the x-axis distance mirror my screenshot? Make within 0|0 groups closer together and 0|0 vs 6|0 distance further apart. 3) Can the height of the p-values be evened out (automatically) to make it easier to understand? — CeC, Mar 28 '23 at 16:27
1) i don´t know if thats the only way 2) and 3) I will add another code to adjust that, but when we even the height of the p values, they will overlap, so I don´t think it will make more easy to understand — juan_bzt, Mar 28 '23 at 18:49
Hi, do you have any recommendation for clustering the sample points along with the boxplot? I'd like to add `geom_point` to my ggplot, but I'm having trouble aligning the points to the box. Thank you! — CeC, Mar 29 '23 at 15:25

score 1 · Answer 2 · answered Mar 28 '23 at 18:53

Regarding clustering the treatments, and making the labels of the p.value the same height, you can adjust the code, but I don´t think is a good idea

# basic plot
ggplot(df, aes(x=group, y=Size, fill=as.factor(Drugcc))) + 
  geom_boxplot(width=0.3, position = position_nudge(x = c(0.2,0, -0.2))) + 
  scale_fill_manual(values=c("#ffffff", "#A6CEE3", "#1F78B4"))+
  stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.signif", label.y=max(df$Size)*1.1) +
  theme(axis.text.x = element_text(angle = 0, vjust = 1, hjust=1),
        axis.ticks.x = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  scale_x_discrete(labels=c("", "0|0", "","", "6|0","","", "6|25", ""))+
  theme(legend.position="top")+
  labs(fill="Drug (uM)")

Thank you! 1) I agree, could I have longer lines (significance) on the top at same level and shorter lines at the bottom on the same level, for readability? 2) How can I `position_nudge(x = c(0.2,0, -0.2))` for `geom_point`? 3) Is ggpubr easier than ggsignif or others? Thanks so much, again! — CeC, Mar 28 '23 at 20:54
2) I don´t know if you can use the opsition nudge in geom point 3) I´ve never used ggsignif sorry I can´t be of more help — juan_bzt, Mar 29 '23 at 16:33

How to add (select) p-values onto a multiple grouped boxplot

2 Answers2