1

Sometimes you want to limit the axis range of a plot to a region of interest so that certain features (e.g. location of the median & quartiles) are emphasized. Nevertheless, it may be of interest to make it clear how many/what proportion of values lie outside the (truncated) axis range.

I am trying to show this when using ggplot2 in R and am wondering whether there is some buildt-in way of doing this in ggplot2 (or alternatively some sensible solution some of you may have used). I am not actually particularly wedded to any particular way of displaying this (e.g. jittered points with a different symbol at the edge of the plot, a little bar outside that depending on how full it is shows the proportion outside the range, some kind of other display that somehow conveys the information).

Below is some example code that creates some mock data and the kind of plot I have in mind (shown below the code), but without any clear indication exactly how much data is outside the y-axis range.

library(ggplot2)
set.seed(seed=123)
group <- rep(c(0,1),each=500)
y <- rcauchy(1000, group, 10)
mockdata <- data.frame(group,y)

ggplot(mockdata, aes(factor(group),y)) + geom_boxplot(aes(fill = factor(group))) + coord_cartesian(ylim = c(-40,40))

enter image description here

Björn
  • 644
  • 10
  • 23
  • You can use the `quantile` function. `coord_cartesian(ylim = quantile(mockdata$y,probs = c(0.10,0.90)))` and that way you can represent what percentage of points are cut off. – A Gore Jul 26 '17 at 19:19
  • Your example code would truncate at the 10th and 90th percentile of the pooled data of the two groups. But these may differ for the two groups, i was hoping for some way to actually show info on the truncation on the plot. – Björn Jul 27 '17 at 04:25

1 Answers1

2

You may compute these values in advance and display them via e.g. geom_text:

library(dplyr)
upper_lim <- 40
lower_lim <- -40
mockdata$upper_cut <- mockdata$y > upper_lim
mockdata$lower_cut <- mockdata$y < lower_lim
mockdata$group <- as.factor(mockdata$group)
mockpts <- mockdata %>% 
    group_by(group) %>% 
    summarise(upper_count = sum(upper_cut), 
              lower_count = sum(lower_cut)) 

ggplot(mockdata, aes(group, y)) + 
    geom_boxplot(aes(fill = group)) + 
    coord_cartesian(ylim = c(lower_lim, upper_lim)) + 
    geom_text(y = lower_lim, data = mockpts, 
              aes(label = lower_count, x = group), hjust = 1.5) + 
    geom_text(y = upper_lim, data = mockpts, 
              aes(label = upper_count, x = group), hjust = 1.5)

enter image description here

tonytonov
  • 25,060
  • 16
  • 82
  • 98