3

I am having trouble adding p-values to a ggplot when the axis is logarithmic and the values to be plotted are all well below 1. It seems that no matter where I tell the function to put the p-value, it always puts it at or above 1, which often ruins my scale.

An MRE:

library(ggplot2)
library(ggpubr)

df <- data.frame("group" = rep(c("A", "B", "C", "D", "E"), each = 5), 
                 "value" = exp(seq(-10,-9, length.out = 25)))

stat_df <- ggpubr::compare_means(formula = value ~ group, data = df, method = "wilcox.test")[1:3,]

p <- ggplot(data = df, aes(x = group, y = value)) +
    geom_boxplot() +
    ggpubr::stat_pvalue_manual(data = stat_df, y.position = 1e-4, step.increase = 0) +
    scale_y_continuous(trans = "log10")

plot(p)

which produces:

boxplot of values showing p-value placed at 1, far above the data

As you can see, even though I have told ggpubr to put the p-value at 1e-4, it put it at 1 (1e0) instead. For values above 1, you can just give it the log10 of the value you want to plot it at (e.g. y.position = 11 plots it at 1e11), but if you try to input a value of 0 or a negative value for y.position, it will not show up; specifically, you get the following:

Warning messages:
1: In self$trans$transform(x) : NaNs produced
2: Transformation introduced infinite values in continuous y-axis 
3: Removed 3 rows containing non-finite values (stat_bracket).

I'm open to using other packages to plot p-values, ggpubr::stat_pvalue_manual has just so far been the most flexible and useful for my purposes. The only workaround I have found for this problem is a very hacky solution using the scales::pseudo_log_trans function and some bizarre trial and error results, but that is far from an ideal solution as it produces different axes than a regular log10 transformation.

tlbello
  • 43
  • 5

1 Answers1

2

I have two solutions for you:

Solution 1

Play around with the vjust and bracket.nudge.y argument in stat_pvalue_manual to find the optimal values to use. This solution still transform the axis using scale_y_continuous.

library(ggplot2)

ggplot(data = df, aes(x = group, y = value)) +
  geom_boxplot() +
  ggpubr::stat_pvalue_manual(data = stat_df, y.position = 1, 
                             step.increase = 0, vjust = 0.1, 
                             bracket.nudge.y = -4.9, tip.length = 0.001) +
  scale_y_continuous(trans = "log10")

Solution 2

This solution abandon the use of scale_y_continuous to log transform the axis, where the transformation is carried out on the value itself. Then use scale_y_continuous to format the y-axis into your desired format.

ggplot(data = df, aes(x = group, y = log10(value))) +
  geom_boxplot() +
  ggpubr::stat_pvalue_manual(data = stat_df, y.position = log10(1e-04)) +
  scale_y_continuous(labels = \(x) formatC(10^x, format = "e", digits = 1))

Created on 2023-01-14 with reprex v2.0.2

benson23
  • 16,369
  • 9
  • 19
  • 38
  • 1
    Thank you very much. The second solution is the one I went with. I had colleagues that went that route (transforming the data itself) and was hoping to avoid it, but for no particularly good reason other than it sometimes makes the axis breaks a bit weird. The first solution feels a bit too hacky to me and I have concerns about trying to do it programmatically (I've got a lot of plots to make), but I very much appreciate it for completeness! – tlbello Jan 16 '23 at 18:01