1

I have a dataframe with different groups ('label' column). For each label, I want to plot a null distribution obtained from bootstrapping (values are in the 'null' column) and the true statistic on top (value in the 'sc' column). Ideally, I would like the area after the statistic to have a different color, to mark that this is my p-value. Is this possible to do with stat_density_ridges?

Here is an example R code:

library(ggplot2)
library(tidyverse)
library(ggridges)

df <- data.frame()

for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))

ggplot(df, aes(x = null, y = label))  +
  stat_density_ridges(scale=1.2,alpha = 1, size=1)+
  scale_x_continuous(limits=c(0,1),breaks=seq(0,1,0.2)) +
  geom_segment(aes(x=sc, xend=sc, y=as.numeric(label)-0.1, yend=as.numeric(label)+0.5), size=1) +
  coord_flip()

The resulting figure is this:

ridge plot

But ideally, I would like each ridge to be more like this:

enter image description here

With the color changes after the sc value. Is that possible? Thanks :)

Quinten
  • 35,235
  • 5
  • 20
  • 53
TanZor
  • 227
  • 1
  • 6

1 Answers1

1

You could use the fill with ..x.. to create different colors at a fixed x value of your plot. So the shaded area will be the same across all plots. You could modify this by using ggplot_build with a separate dataframe that has the p_values which are the thresholds. So with these thresholds you could conditionally change the color in the layer. Here is some reproducible code:

library(ggplot2)
library(tidyverse)
library(ggridges)

df <- data.frame()

set.seed(7) # for reproducibility
for (label in LETTERS) {
  mean=rnorm(1,0.5,0.2)
  null = rnorm(1000,mean,0.1);
  sc = rnorm(1,0.5,0.2)
  df <- rbind(df, data.frame(label=label, null=null, sc=sc))
}

df <- df %>% 
  mutate(label=as.factor(label))
# Create dataframe with p_values ranges per label
p_values = df %>% 
  group_by(label) %>% 
  summarise(p_value = unique(sc)) %>%
  mutate(label = as.integer(label)) # make sure label is the same as in ggplot_build

# plot
p <- ggplot(df, aes(x = null, y = label, fill = ifelse(..x.. < sc, "no sign", "sign"), group = factor(label)))  +
  stat_density_ridges(geom = "density_ridges_gradient",,
                      scale=1.2,alpha = 1, size=1,
                      calc_ecdf = TRUE) +
  scale_fill_manual(values = c("red", "blue"), name = "") +
  coord_flip()
p
#> Picking joint bandwidth of 0.0224

# Modify layer
q <- ggplot_build(p)
#> Picking joint bandwidth of 0.0224
q$data[[1]] = q$data[[1]] %>%
  left_join(., p_values,
            by = c("group" = "label")) %>%
  mutate(fill = case_when(x < p_value ~ fill,
                          TRUE ~ "blue")) %>%
  select(-p_value)
q <- ggplot_gtable(q)
plot(q)

Created on 2023-03-28 with reprex v2.0.2

As you can see in the latest plot, the shaded areas are now according to the sc value of your dataframe per group.

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • 1
    That's really great, thanks Quinten. I didn't know you could modify plots with ggplot_build this way! – TanZor Mar 28 '23 at 17:12
  • Hi @TanZor, Yes ggplot_build is really nice to modify the layers of ggplot! – Quinten Mar 28 '23 at 20:05
  • Thanks!! I'm noticing a strange pattern where some of the separating lines are not entirely aligned with the p-values. Specifically, it seems like it's plotting something like min(p_value, 0.6). Any idea what this is about? cheers – TanZor Mar 28 '23 at 20:12
  • Do you use the same data with set.seed(7) to make it reproducible? That could give different results. – Quinten Mar 28 '23 at 20:21
  • Yes, I was just running your code and compared the figure to the values in p_values – TanZor Mar 28 '23 at 20:25
  • 1
    Ah I think the issue is that we only change red to blue, but not blue to red. When we start with all red in creating p, it works fine: stat_density_ridges(geom = "density_ridges_gradient",, scale=1.2,alpha = 1, size=1, calc_ecdf = TRUE, fill='red') – TanZor Mar 28 '23 at 20:32
  • Hi @TanZor, I checked the code again and I am not sure what you exactly mean. Let's take label B as example. The p_value is 0.426 so everything below 0.426 will be red and everything above will be blue. You can check that in the graph. If you want you could swap this by saying > instead of <. Is that what you mean? Hope it's clear – Quinten Mar 29 '23 at 10:41
  • Hi @Quinten: have a look at label E: the p-value is 0.725 but the color changes at around 0.6. – TanZor Mar 29 '23 at 10:50
  • @TanZor, please replace the `mutate` case_when with this `mutate(fill = ifelse(x < p_value, "red", "blue"))`? Now it should work – Quinten Mar 29 '23 at 10:56
  • 1
    Yes that's what I did :) (see here: https://stackoverflow.com/questions/75876232/move-line-segments-to-front-with-stat-density-ridges) – TanZor Mar 29 '23 at 10:57
  • Aah great to see it is fixed! – Quinten Mar 29 '23 at 11:02