1

I have a dataframe with different groups ('label' column). For each label, I want to plot a null distribution obtained from bootstrapping (values are in the 'null' column) and the true statistic on top as a line (value in the 'sc' column). Ideally, I would like the line to appear on top of the density plots, but such that it never hides histograms from previous groups. This is the code that I have so far:

library(ggplot2)
library(tidyverse)
library(ggridges)

line_height <- 1.7 # parameter to control the height of the green lines

df <- data.frame()

set.seed(7) # for reproducibility
for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))

scs = df %>% 
  group_by(label) %>% 
  summarise(sc = unique(sc)) %>%
  mutate(label = as.integer(label)) # make sure label is the same as in ggplot_build

# plot
p <- ggplot(df, aes(x = null, y = label, group = factor(label)))  +
  stat_density_ridges(geom = "density_ridges_gradient",alpha = 1, size=1,
                      calc_ecdf = TRUE, fill="#dadada",
                      quantiles=2, quantile_lines = TRUE) +
  scale_x_continuous(limits=c(0,1), breaks=seq(0,1,0.1))+
  coord_flip() +
  theme_classic()

And then, making adjustments with ggplot_build (thanks to @Quinten!):

q <- ggplot_build(p)

#> Picking joint bandwidth of 0.0224
q$data[[1]] = q$data[[1]] %>%
  left_join(., scs,
            by = c("group" = "label")) %>%
  mutate(fill = ifelse(x < sc, "#646464", fill),
         x = ifelse(datatype=='vline', sc, x),
         colour = ifelse(datatype=='vline', '#4EAF4A', colour),
         ymax = ifelse(datatype=='vline', ymin+line_height, ymax),
         size = ifelse(datatype=='vline', 1.5,size)) 

qplot <- ggplot_gtable(q) 
plot(qplot)

The resulting figure is this:

almost there with one annoying problem

Which is almost what I need, with the exception that the green lines should not be covered by the black outlines. I tried reordering the rows in q$data[[1]] such that the lines appear last within each group, but this doesn't seem to help. Am I missing something? Thanks :)

TanZor
  • 227
  • 1
  • 6

0 Answers0