8

I've created a side-by-side boxplot using ggplot2.

p <- ggplot(mtcars, aes(x=factor(cyl), y=mpg))
p + geom_boxplot(aes(fill=factor(cyl)))

I want to annotate with min, max, 1st quartile, median and 3rd quartile in the plot. I know geom_text() can do so and may be fivenum() is useful. But I cannot figure out how exactly I can do!. These values should be displayed in my plot.

Paul
  • 127
  • 1
  • 2
  • 8

3 Answers3

19

The most succinct way I can think of is to use stat_summary. I've also mapped the labels to a color aesthetic, but you can, of course, set the labels to a single color if you wish:

ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) + 
  geom_boxplot(width=0.6) +
  stat_summary(geom="text", fun.y=quantile,
               aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
               position=position_nudge(x=0.33), size=3.5) +
  theme_bw()

In the code above we use quantile as the summary function to get the label values. ..y.. refers back to the output of the quantile function (in general, ..*.. is a ggplot construction for using values calculated within ggplot).

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Nice answer. While a trivial difference, I am not sure ggplot uses the same fivenum summary - so there may be differences. – user20650 Jun 26 '16 at 00:52
  • 1
    @user20650 (isn't it about time you gave yourself of more distinctive SO name?) doesn't gpplot call whatever function is given in `fun.y`, so wouldn't it just call the `fivenum` function from `stats`? – eipi10 Jun 26 '16 at 01:18
  • 1
    As, now I see your point. geom_boxplot is using quantile, but fivenum is using a different algorithm. I've updated my answer to use quantile. – eipi10 Jun 26 '16 at 01:53
  • How to plot uppper and lower whisker numbers on the boxplot as maximum and minimum values? (instead of the outliers numbers). For example for the last boxplot in blue, the minimum and maximum are outliers. – Andre230 May 17 '22 at 11:31
6

One way is to simply make the data.frame you need, and pass it to geom_text or geom_label:

library(dplyr)

cyl_fivenum <- mtcars %>% 
    group_by(cyl) %>% 
    summarise(five = list(fivenum(mpg))) %>% 
    tidyr::unnest()

ggplot(mtcars, aes(x=factor(cyl), y=mpg)) + 
    geom_boxplot(aes(fill=factor(cyl))) + 
    geom_text(data = cyl_fivenum, 
              aes(x = factor(cyl), y = five, label = five), 
              nudge_x = .5)

boxplot with labels

alistaire
  • 42,459
  • 4
  • 77
  • 117
5

In case anyone is dealing with large ranges and has to log10 transform their y-axis, I found some code that works great. Just add 10^..y.. and scale_y_log10(). If you don't add 10^ before ..y.. the actual quantile values will be log transformed and displayed as such.

Does not work

ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) + 
  geom_boxplot(width=0.6) +
  stat_summary(geom="text", fun.y=quantile,
           aes(label=sprintf("%1.1f", ..y..), color=factor(cyl)),
           position=position_nudge(x=0.45), size=3.5) +
  scale_y_log10()+
  theme_bw()

enter image description here

Works great

ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) + 
  geom_boxplot(width=0.6) +
  stat_summary(geom="text", fun.y=quantile,
           aes(label=sprintf("%1.1f", 10^..y..), color=factor(cyl)),
           position=position_nudge(x=0.45), size=3.5) +
  scale_y_log10()+
  theme_bw()

enter image description here

TheSciGuy
  • 1,154
  • 11
  • 22