8

I have some very, very few outliers in my dataset making the boxplots difficult to read:

library(ggplot2)
mtcars$mpg[1] <- 60
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

enter image description here

Hence, I would like to indicate the extreme outliers like this:

enter image description here

Any ideas how to do this in ggplot2? Transforming the axis is not an option for me...

chamaoskurumi
  • 2,271
  • 2
  • 23
  • 30

1 Answers1

9

This is a start:

library("ggplot2")
mtcars$mpg[1:2] <- c(50,60)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

Define max value:

maxval <- 40

Use dplyr (could also be done in base R or plyr) to extract outliers and put together the text string:

library("dplyr")
dd <- mtcars %>% filter(mpg>maxval) %>%
    group_by(cyl) %>%
        summarise(outlier_txt=paste(mpg,collapse=","))

Set max y value and add an arrow plus label:

library("grid") # needed for arrow() function
p2 <- p + geom_boxplot() +
    scale_y_continuous(limits=c(min(mtcars$mpg),maxval))+
       geom_text(data=dd,aes(y=maxval,label=outlier_txt),
                 size=3,vjust=1.5,hjust=-0.5)+
          geom_segment(data=dd,aes(y=maxval*0.95,yend=maxval,
                       xend=factor(cyl)),
                 arrow = arrow(length = unit(0.1,"cm")))
p2

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453