1

I have created the following alluvial diagram in R as follows:

df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
                 Pearson1 = c(6, 3, 2, 5, 4, 1),
                 Spearman1 = c(6, 5, 1, 2, 3, 4),
                 Kendall1 = c(6, 5, 1, 2, 3, 4),
                 Pearson2 = c(6, 5, 1, 2, 3, 4),
                 Spearman2 = c(6, 5, 1, 2, 4, 3),
                 Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7,col= "red")

which results in enter image description here

How can I set some specific lines to have different col than red? e.g. X1 from Variables to Pearson1, and then again from Kendall1 to Spearman2 and X3 in all states? I see I can't do that based on alluvial(). How can I recreate the above alluvial based on another function??

tjebo
  • 21,977
  • 7
  • 58
  • 94
nickolakis
  • 621
  • 3
  • 7

1 Answers1

1

ggalluvial allows for varying aesthetics over one "flow" (or alluvium). The documentation provides a trick to use geom_flow with stat = "alluvium" and to specify "lode.guidance = "frontback".

The actual aesthetic (color) will need to be added to the data. geom_flow and geom_stratum will require different columns for the aesthetic, (try what happens when you use the same for both). I am passing the color directly and using scale_identity, but you can of course also use random values and then define your colors with scale_manual.

library(ggalluvial)
#> Loading required package: ggplot2
library(tidyverse)
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
                 Pearson1 = c(6, 3, 2, 5, 4, 1),
                 Spearman1 = c(6, 5, 1, 2, 3, 4),
                 Kendall1 = c(6, 5, 1, 2, 3, 4),
                 Pearson2 = c(6, 5, 1, 2, 3, 4),
                 Spearman2 = c(6, 5, 1, 2, 4, 3),
                 Kendall2 = c(6, 5, 1, 2, 3, 4))

df_long <-
  df %>%
  ## reshape your data in order to bring it to the right shape
  mutate(across(everything(), as.character)) %>%
  rownames_to_column("ID") %>%
  pivot_longer(-ID) %>%
  ## correct order of your x
  mutate(
    name = factor(name, levels = names(df)),
    ## now hard code where you want to change the color.
    ## lodes need a different highlighting then your strata
    ## there are of course many ways to add this information, I am using case_when here
    ## you could also create separate vectors and add them to your data frame
    highlight_lode = case_when(
      ID == 3 ~ "blue",
      ID == 1 & name %in% c("Variable", "Kendall1", "Pearson2") ~ "orange",
      TRUE ~ "red"
    ),
    highlight_stratum = case_when(
      ID == 3 ~ "blue",
      ID == 1 & name %in% c(
        "Variable", "Pearson1", "Kendall1", "Pearson2",
        "Spearman2"
      ) ~ "orange",
      TRUE ~ "red"
    )
  )

ggplot(df_long, 
       ## now use different color aesthetics in geom_flow and geom_stratum
       aes(x = name, stratum = value, alluvium = ID, label = value)) +
  ## I took this trick with lode guidance from the documentation - this allows varying aesthetics over one flow. 
  geom_flow(aes(fill = highlight_lode), stat = "alluvium", lode.guidance = "frontback", color = "darkgray") +
  geom_stratum(aes(fill = highlight_stratum)) +
  geom_text(stat = "stratum") +
  ## as I have named the colors directly, it is appropriate to use scale_identity
  scale_fill_identity()
#> Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

Created on 2023-01-29 with reprex v2.0.2

Cory Brunson
  • 668
  • 4
  • 10
tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thank you for you time, that seems to work very well. However i have some Q. 1. in the mutatet i set the colour for the ones that a want a specific. How can make all others to have a specific color, let's say `red`? – nickolakis Jan 28 '23 at 21:44
  • 2. How can i make the margins between the categories smaller and the flows of each variable thiner? Like the one in my example. – nickolakis Jan 28 '23 at 21:49
  • 1
    @nickolakis I've updated the answer now. I am not sure what you mean with (2) – tjebo Jan 29 '23 at 10:16
  • Thank you for the clarification. In 2. I want to make the visual more "clean". Is it possible to have same space between the categories `1,2,...,6`? meaning the boxes to have some space between them and be smaller (thus smaller flow lines). – nickolakis Jan 29 '23 at 15:53
  • 1
    @nickolakis I see. I don't think this package allows for that - in the vignette, it explicitly points to ggforce for "free floating" alluvium/sankey charts, but I failed to implement this as I don't really understand how the data needs to be prepared- it actually might not even be possible to create "1 to 1" alluviums with ggforce. Maybe worth to ask a new question (if you should do, please link to this question here). – tjebo Jan 29 '23 at 18:11
  • 1
    Thank you for your effort. It's not essential at this moment, but if i post it i will link it! – nickolakis Jan 29 '23 at 19:08