2

I've been trying for days now to sort the order of strata and flows in ggalluvial. I want to visualize the flow of patients through different screenings procedures (X1, X2, X3, X4) and color the flow based on the final diagnosis (Values in X4).

Can you help me sort the values within the groups in the first columns of example A and B? I want all red, yellow, and blue values stacked on top of each other within each of the groups.

So far I have tried various combinations of wide-format, aes.flow "backwards" and "forwards," lode.guidance, and lode.ordering...

If this is not possible in ggalluvial but possible in other packages, I'd like to know as well.

Thanks in advance.

DATA in wide format:

set.seed(1)
data <- tibble(
  ID = 1:879,
  X1 = sample(c("only_parent", "parent_and_3D", "only_3D"), size = 879, replace = TRUE, prob = c(0.1, 0.8, 0.1))) %>% 
  mutate(
    X2 = case_when(
      X1 == "only_parent" ~ sample(c("only_I", "not_identified"), size = n(), prob = c(0.1, 0.9), replace = TRUE),
      X1 == "parent_and_3D" ~ sample(c("only_I", "both_I_and_II", "only_II", "not_identified"), size = n(), prob = c(0.05, 0.05, 0.2, 0.7), replace = TRUE),
      X1 == "only_3D"~ sample(c("only_II", "not_identified"), size = n(), prob = c(0.1, 0.9), replace = TRUE),
      TRUE ~ NA_character_)) %>% 
  mutate(
    X3 = case_when(
      X2 == "only_I" ~ "PO_only",
      X2 == "both_I_and_II" ~ sample(c("PO_and_EHL", "PO_and_F/T", "PO_and_F/T_and_EHL"), size = n(), prob = c(0.3, 0.5, 0.2), replace = TRUE),
      X2 == "only_II"~ sample(c("F/T", "F/T_and_EHL", "EHL"), size = n(), prob = c(0.1, 0.6, 0.4), replace = TRUE),
      X2 == "not_identified" ~ "not_identified",
      TRUE ~ NA_character_)) %>% 
  mutate(
    X4 = case_when(
      X3 == "PO_only"    ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.02, 0.1, 0.88), replace = TRUE),
      X3 == "PO_and_EHL" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.05, 0.2, 0.75), replace = TRUE),
      X3 == "PO_and_F/T" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.05, 0.2, 0.75), replace = TRUE),
      X3 == "PO_and_F/T_and_EHL" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.05, 0.2, 0.75), replace = TRUE),
      X3 == "F/T" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.02, 0.1, 0.88), replace = TRUE),
      X3 == "F/T_and_EHL" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.05, 0.2, 0.75), replace = TRUE),
      X3 == "EHL" ~ sample(c("Two_primary_ind", "One_primary_ind", "No TW"), size = n(), prob = c(0.02, 0.2, 0.88), replace = TRUE),
      X3 == "not_identified" ~ "not_identified",
      TRUE ~ NA_character_ ))

head(data)

# A tibble: 6 x 5
     ID X1            X2             X3             X4            
  <int> <chr>         <chr>          <chr>          <chr>         
1     1 parent_and_3D not_identified not_identified not_identified
2     2 parent_and_3D only_II        F/T_and_EHL    No TW         
3     3 parent_and_3D not_identified not_identified not_identified
4     4 only_parent   only_I         PO_only        No TW         
5     5 parent_and_3D only_II        F/T_and_EHL    No TW         
6     6 only_3D       not_identified not_identified not_identified

Example A
The values are not sorted in the bottom box of the first column.

data_long_a <- data %>% 
  group_by(X1, X2, X3, X4) %>% 
  count() %>% 
  mutate(
    fill_stat = factor(X4, levels = c("not_identified", "No TW", "One_primary_ind", "Two_primary_ind"))) %>% 
  ungroup  %>%
  arrange(fill_stat) %>% 
  mutate(subject = seq(1, n())) %>% 
  gather(key, value, -n , -subject, -fill_stat) %>% 
  mutate(
    key = factor(key, levels = c("X1", "X2", "X3", "X4"))) %>% 
  arrange(key, fill_stat) 



data_long_a %>% 
  filter(key %in% c("X1", "X2")) %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_flow(aes(fill = fill_stat)) +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("#BAB3B3EB", "red", "yellow", "blue"))+
  theme_void()

Example A - Intended results

Example B
The flow lines in the first column are not sorted.

data_long_b <- data %>%
  select(-X1) %>% 
  filter(X4 != "not_identified") %>% 
  group_by(X2, X3, X4) %>% 
  count() %>% 
  mutate(
    fill_stat = factor(X4, levels = c("not_identified", "No TW", "One_primary_ind", "Two_primary_ind"))) %>% 
  ungroup  %>%
  arrange(fill_stat) %>% 
  mutate(subject = seq(1, n())) %>% 
  gather(key, value, -n , -subject, -fill_stat) %>% 
  mutate(
    key = factor(key, levels = c("X2", "X3", "X4"))) %>% 
  arrange(key, fill_stat) 


data_long_b %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_flow(aes(fill = fill_stat),
            aes.flow = "backward") +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("red", "yellow", "blue"))+
  theme_void()

Example B - not intended result

Steen Harsted
  • 1,802
  • 2
  • 21
  • 34
  • 1
    Have a look at the second visual example [here](http://corybrunson.github.io/ggalluvial/reference/stat_flow.html). Is that what you're after, regarding the sorting of the flows? – Cory Brunson Jul 18 '19 at 22:57
  • Thank you. Yes, that example has the sorted columns that I am looking for. I will check it out tonight when I get to my computer. – Steen Harsted Jul 19 '19 at 11:45
  • 1
    OK great. The key is `aes.bind = TRUE`, which forces the orderings of the lodes and flows to respect aesthetics before other criteria. I can work up an example using your data after the weekend; or i'll upvote anyone who beats me to it. – Cory Brunson Jul 20 '19 at 03:41
  • Thanks alot - It works :-). I have added a solution, but wont accept the answer. If you want to make an example - I'll accept that. Going on holliday now - so it will take some days. Thanks again. – Steen Harsted Jul 20 '19 at 10:08
  • Glad it was what you needed. : ) Since you've already written the code, i can add some explanation to your answer if you'll then accept it. – Cory Brunson Jul 23 '19 at 18:02
  • Sure. I will do that – Steen Harsted Jul 23 '19 at 18:28
  • An explanation would be great. Could you maybe add a few lines on why the default sorting is as it is. I don't understand when the default would be better than aes.bind = TRUE . Thanks again . I will accept as soon as you have added some explanation. – Steen Harsted Jul 23 '19 at 18:38
  • 1
    OK, i've submitted an edit! While writing it i noticed that `aes.bind` is not documented for `stat_alluvium()` as it is for `stat_flow()`! There might be some subtle reason for that, but i'll go through the package and either add the parameter or else explain why it's not there. – Cory Brunson Jul 25 '19 at 11:19
  • 1
    `aes.bind` is now declared and documented for `stat_alluvium()` on GitHub, though i don't believe the functionality has changed. I'll update the version and resubmit to CRAN within a few weeks. Thanks @Steen for this prompt! – Cory Brunson Jul 25 '19 at 11:39
  • Weird... I can't see the edit. I am using a phone app though, so maybe that's why. I will check it and accept the edit if I can when I get back from holiday. Thank you for your great work! – Steen Harsted Jul 25 '19 at 18:09
  • 1
    SO tells me that the edit is in peer review. So it might not become visible to you until that's done. [This answer](https://meta.stackexchange.com/a/76284) is the best explanation i've found. – Cory Brunson Jul 25 '19 at 20:36
  • 1
    OK, the edit was rejected. If it suits you, i can just post the explanation along with your example solutions as a new answer. – Cory Brunson Jul 27 '19 at 11:52
  • Please do that Cory – Steen Harsted Jul 27 '19 at 13:41

2 Answers2

3

The background here is that, even though the strata (the different values stacked at each axis) may have a natural order, the alluvia, which represent individual cases or cohorts, usually do not. This means that one job of a stat layer (e.g. stat_alluvium()) is to determine an ordering of the lodes within each stratum. (This then determines the flows between strata.)

To improve clarity, stat_alluvium() and stat_flow() use the strata of the cases or cohorts at nearby axes to guide their positioning at a given axis. By default, it does this in a "zigzag" order, adapted from the alluvial package; see the "lode guidance" documentation for additional options.

This behavior can be problematic when the user wants to group cohorts together within strata, as when lodes and flows are assigned aesthetics (usually fill, but optionally alpha, colour, linetype, and size). The aes.bind parameter addresses this problem by prioritizing aesthetics before (but not instead of) strata in nearby axes when determining lode orderings.

@Steen provided a syntactic answer, which i'll basically copy here. I make one change, from stat_flow() to stat_alluvium() in Example B, to illustrate that aes.bind can be passed to and will be correctly interpreted by either geom layer.

Example A:

data_long_a %>% 
  filter(key %in% c("X1", "X2")) %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_flow(aes(fill = fill_stat), aes.bind = TRUE) +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("#BAB3B3EB", "red", "yellow", "blue"))+
  theme_void()

Example B:

data_long_b %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_alluvium(aes(fill = fill_stat),
                aes.bind = TRUE) +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("red", "yellow", "blue"))+
  theme_void()

Created on 2019-07-27 by the reprex package (v0.2.1)

Cory Brunson
  • 668
  • 4
  • 10
0

Like Cory Brunson writes in the comment: "The key is aes.bind = TRUE "

Example A:

data_long_a %>% 
  filter(key %in% c("X1", "X2")) %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_flow(aes(fill = fill_stat), aes.bind = TRUE) +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("#BAB3B3EB", "red", "yellow", "blue"))+
  theme_void()

enter image description here

Example B:

data_long_b %>% 
  ggplot(
    aes(x = key,
        y = n,
        stratum = value, 
        alluvium = subject,
        label = value))+
  geom_flow(aes(fill = fill_stat),
            aes.bind = TRUE) +
  geom_stratum() +
  geom_text(stat = "stratum")+
  scale_fill_manual(values=c("red", "yellow", "blue"))+
  theme_void()

enter image description here

Steen Harsted
  • 1,802
  • 2
  • 21
  • 34