I have a dataframe with different groups ('label' column). For each label, I want to plot a null distribution obtained from bootstrapping (values are in the 'null' column) and the true statistic on top as a line (value in the 'sc' column). Ideally, I would like the line to appear on top of the density plots, but such that it never hides histograms from previous groups. This is the code that I have so far:
library(ggplot2)
library(tidyverse)
library(ggridges)
line_height <- 1.7 # parameter to control the height of the green lines
df <- data.frame()
set.seed(7) # for reproducibility
for (label in LETTERS) {
mean=rnorm(1,0.5,0.2)
null = rnorm(1000,mean,0.1);
sc = rnorm(1,0.5,0.2)
df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}
df <- df %>%
mutate(label=as.factor(label))
scs = df %>%
group_by(label) %>%
summarise(sc = unique(sc)) %>%
mutate(label = as.integer(label)) # make sure label is the same as in ggplot_build
# plot
p <- ggplot(df, aes(x = null, y = label, group = factor(label))) +
stat_density_ridges(geom = "density_ridges_gradient",alpha = 1, size=1,
calc_ecdf = TRUE, fill="#dadada",
quantiles=2, quantile_lines = TRUE) +
scale_x_continuous(limits=c(0,1), breaks=seq(0,1,0.1))+
coord_flip() +
theme_classic()
And then, making adjustments with ggplot_build (thanks to @Quinten!):
q <- ggplot_build(p)
#> Picking joint bandwidth of 0.0224
q$data[[1]] = q$data[[1]] %>%
left_join(., scs,
by = c("group" = "label")) %>%
mutate(fill = ifelse(x < sc, "#646464", fill),
x = ifelse(datatype=='vline', sc, x),
colour = ifelse(datatype=='vline', '#4EAF4A', colour),
ymax = ifelse(datatype=='vline', ymin+line_height, ymax),
size = ifelse(datatype=='vline', 1.5,size))
qplot <- ggplot_gtable(q)
plot(qplot)
The resulting figure is this:
Which is almost what I need, with the exception that the green lines should not be covered by the black outlines. I tried reordering the rows in q$data[[1]]
such that the lines appear last within each group, but this doesn't seem to help. Am I missing something?
Thanks :)