3

I would like modify an existing sankey plot using ggplot2 and ggalluvial to make it more appealing

my example is from https://corybrunson.github.io/ggalluvial/articles/ggalluvial.html

library(ggplot2)
library(ggalluvial)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

Created on 2020-10-01 by the reprex package (v0.3.0)

Now, I would like to change this plot that it looks similar to a plot from https://sciolisticramblings.wordpress.com/2018/11/23/sankey-charts-the-new-pie-chart/, i.e. 1. change absolute to relative values (percentage) 2. add percentage labels and 3. apply partial fill (e.g. "missing" and "never") enter image description here

My approach: I think I could change the axis to percentage with something like: scale_y_continuous(label = scales::percent_format(scale = 100)) However, I am not sure about step 2. and 3.

captcoma
  • 1,768
  • 13
  • 29

1 Answers1

4

This could be achieved like so:

  1. Changing to percentages could be achieved by adding a new column to your df with the percentage shares by survey, which can then be mapped on y instead of freq.

  2. To get nice percentage labels you can make use of scale_y_continuous(label = scales::percent_format())

  3. For the partial filling you can map e.g. response %in% c("Missing", "Never") on fill (which gives TRUE for "Missing" and "Never") and set the fill colors via scale_fill_manual

  4. The percentages of each stratum can be added to the label via label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1)) in geom_text where I make use of the variables ..stratum.. and ..count.. computed by stat_stratum.

library(ggplot2)
library(ggalluvial)
library(dplyr)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))

vaccinations <- vaccinations %>% 
  group_by(survey) %>% 
  mutate(pct = freq / sum(freq))

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = pct,
           fill = response %in% c("Missing", "Never"), 
           label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  scale_y_continuous(label = scales::percent_format()) +
  scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you for your help, this solution looks really nice. Regarding 2., how can the percentage be added just next to the values of each strata (like in the example plot)? – captcoma Oct 01 '20 at 08:00
  • 1
    Hi @captcoma. I just made an edit to add the percentages to the labels. Best S. – stefan Oct 01 '20 at 08:23