8

I'm currently plotting a number of different distributions of first differences from a number of regression models in ggplot. To facilitate interpretation of the differences, I want to mark the 2.5% and the 97.5% percentile of each distribution. Since I will be doing quite a few plots, and because the data is grouped in two dimension (model and type), I would like to define and plot the respective percentiles in the ggplot environment. Plotting the distributions using facets gets me to exactly where I want except for the percentiles. I could of course do this more manually, but I would ideally want to find a solution where I am still able to use facet_grid, since this spared me a lot of hassle trying to fit the different plots together.

Here's an example using simulated data:

df.example <- data.frame(model = rep(c("a", "b"), length.out = 500), 
                      type = rep(c("t1", "t2", "t2", "t1"), 
                      length.outh = 250), value = rnorm(1000))

 ggplot(df.example, aes(x = value)) +
 facet_grid(type ~ model) +
 geom_density(aes(fill = model, colour = model))

I've tried to add quantiles in two ways. The first one produces an error message:

 ggplot(df.example, aes(x = value)) +
 facet_grid(. ~ model) +
 geom_density(aes(fill = model, colour = model)) +
 geom_vline(aes(x = value), xintercept = quantile(value, probs = c(.025, .975)))
Error in quantile(value, probs = c(0.025, 0.975)) : object 'value' not found

While the second one gets me the quantiles for the the complete variable and not for the sub-densities. That is, the plotted quantiles are identical for all four densities.

 ggplot(df.example, aes(x = value)) +
 facet_grid(type ~ model) +
 geom_density(aes(fill = model, colour = model)) +
 geom_vline(xintercept = quantile(df.example$value, probs = c(.025, .975)))

I consequently wonder if there is a way to plot the specific quantiles for each subgroup within the ggplot2 environment?

Greatly appreciate any input.

rcs
  • 67,191
  • 22
  • 172
  • 153
chrstnsn
  • 237
  • 1
  • 3
  • 6

4 Answers4

5

Use plyr (or dplyr, data.table) to precompute these values ...

set.seed(1)
# ...

df.q <- ddply(df.example, .(model, type),
              summarize, q=quantile(value, c(.025, .975)))    
p + geom_vline(aes(xintercept=q), data=df.q)

plot

rcs
  • 67,191
  • 22
  • 172
  • 153
5

You can calculate the quantiles beforehand.

Using your example data:

library (dplyr)
d2 <- df.example %>%
  group_by(model, type) %>%
  summarize(lower = quantile(value, probs = .025),
            upper = quantile(value, probs = .975))

And then plot like this:

ggplot(df.example, aes(x = value)) +
  facet_grid(type ~ model) +
  geom_density(aes(fill = model, colour = model)) +
  geom_vline(data = d2, aes(xintercept = lower)) +
  geom_vline(data = d2, aes(xintercept = upper))

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94
3

Nowadays, it’s possible to use stat_summary() with the orientation option to achieve the same result without precomputation.

Define a dummy y value for each panel to group the observations along with orientation = "y". Then use a custom fun to compute a vector of desired quantiles for each panel in stat_summary(). To display the result as vertical lines, specify geom = "vline" and its required xintercept from the computed x values with xintercept = after_stat(x) in the aesthetic specification, now using the result computed with fun.

library(ggplot2)

set.seed(1)

df.example <- data.frame(
  model = rep(c("a", "b"), length.out = 500),
  type = rep(c("t1", "t2", "t2", "t1"),
    length.outh = 250
  ), value = rnorm(1000)
)

ggplot(df.example, aes(x = value)) +
  facet_grid(type ~ model) +
  geom_density(aes(fill = model, colour = model)) +
  stat_summary(
    geom = "vline",
    orientation = "y",
    # y is a required aesthetic, so use a dummy value
    aes(y = 1, xintercept = after_stat(x)),
    fun = function(x) {
      quantile(x, probs = c(0.025, 0.975))
    }
  )

Mikko Marttila
  • 10,972
  • 18
  • 31
  • 2
    (+1) If you use `fun` instead of `fun.data`, you can make this slightly simpler: `stat_summary(aes(y = 1, xintercept = after_stat(x)), fun = function(x) quantile(x, probs = c(0.025, 0.975)), geom = "vline", orientation = "y")` – Axeman Feb 02 '22 at 19:26
  • Or `stat_summary(aes(y = 1, xintercept = after_stat(x)), fun = quantile, fun.args = list(probs = c(0.025, 0.975)), geom = "vline", orientation = "y")`. – Axeman Feb 02 '22 at 19:28
  • 1
    @Axeman aah yeah thanks for pointing that out. That would be simpler. Another way that comes to mind now would be to keep `fun.data` but change the data frame to have a column called `yintercept` and drop the `xintercept = stat(x)` from the `aes()` altogether. – Mikko Marttila Feb 02 '22 at 22:06
  • 1
    That said, I think the `fun` approach is definitely easier to actually remember. I've edited it in, favouring the anonymous function still as I find it more readable, personally. Thanks again! – Mikko Marttila Feb 02 '22 at 22:15
-1

Good question. The more general version of the same question is: how do you call functions on the subsetted datasets when using facets? This seems like a very useful feature and so I searched around but could not find anything about it.

The answers already given are excellent. Another option is to use multiplot() as a way of doing the faceting manually.

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
  • I agree. Both solutions are very neat, but, as you note, does not really solve the issue I adress in my question. This would indeed by a very interesting feature for ggplot. – chrstnsn Jun 01 '15 at 09:30