1

I know the default setting for geom_boxplot() is:

  • Q3 + 1.5 * IQR
  • Q1 - 1.5 * IQR

but I wanted to do 4 standard deviations from the mean:

  • MEAN + 4 * SD
  • MEAN - 4 * SD

Is this possible to do in ggplot2? If not, what is the alternative?

I saw a post that asked about changing to different IQR, but I am specifically interested in changing to standard deviation.

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
Sheila
  • 2,438
  • 7
  • 28
  • 37
  • maybe this could help: https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html – MLavoie Jan 19 '18 at 19:45
  • @MLavoie Thanks for the link. I just looked at it and couldn't find anything specific to changing outlier definitions to to SD. Did you have a specific section that you are referring to? – Sheila Jan 19 '18 at 19:48
  • you can create a new stat..might be use in collaboration with stat_summary which can be included in your geom_boxplot() – MLavoie Jan 19 '18 at 19:52
  • the cloest I can see about this is in the documentation here https://www.rdocumentation.org/packages/ggplot2/versions/2.2.1/topics/geom_boxplot . See the point about `coef` and perhaps even `outliers` – InfiniteFlash Jan 19 '18 at 20:16
  • Extending on @InfiniteFlashChess' comment: use `geom_boxplot(..., stat = "identity")` and see `?geom_boxplot` for an example on how to change the default computations of this function. – markus Jan 19 '18 at 20:18

1 Answers1

1

This can be done with stat = "identity", as mentioned in some of the comments, but the trick is to get the outliers into the data. Outliers need to be provided in a list column. Here's how you do it.

First, make up some data and draw a regular boxplot:

set.seed(123)
d <- data.frame(y = c(rnorm(100), rnorm(100)+.5, rnorm(100)-1),
                x = rep(c("A", "B", "C"), each = 100))

ggplot(d, aes(x, y)) + geom_boxplot()

enter image description here

Now, calculate the stats manually and draw boxplot with alternative outlier definition. Note that I use mean +/- 2*SD so I get a few more outliers. It should be obvious how to change the code to +/- 4*SD.

library(dplyr)

d %>% group_by(x) %>%
  summarize(middle = median(y),
            mean = mean(y),
            sd = sd(y),
            lower = quantile(y, probs = .25),
            upper = quantile(y, probs = .75),
            ymin = max(mean - 2*sd, min(y)),
            ymax = min(mean + 2*sd, max(y)),
            outliers = list(y[y<ymin | y > ymax])) %>%
  ggplot(aes(x, ymin = ymin, lower = lower,
             middle = middle, upper = upper, ymax = ymax,
             outliers = outliers)) + 
  geom_boxplot(stat = "identity")

enter image description here

Disclaimer: I've only tested this with the current development version of ggplot2, not sure whether it works with the version currently on CRAN.

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104