0

It's my first time using the ggalluvial package and I have problems to adapt the design for my purposes.

Consider this example data I came up with:

data <- data.frame(person = c(rep("x",5), rep("y",2), rep("z",4), rep("w",2), rep("v",7)),    # Create new data frame
                  disease = c("ADHD", "Depression", "Phobia", "Schizophrenia", "Bipolar"),
                  marker1 = c(sample(paste("cz",1:15, sep="")), "cz4", "cz5", "cz10", "cz1", "cz3"),
                  marker2 = rep(paste("ab",1:10,sep=""), 2),
                  domain = c("Development", "Mood", "Anxiety", "Psychiatric", "Psychiatric"),
                  freq = 1)

This is the plot I have so far:

ggplot(data = data,
       aes(axis1 = disease,   
           axis2 = person, 
           axis3 = marker1,
           axis4 = marker2,
           y = freq)) +
  geom_alluvium(aes(fill = domain), curve_type = "linear", width=0.8) + 
  geom_stratum(alpha = .5, width=0.8) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_void()

enter image description here

Note that I added 'freq' just to have more common heights of the bars and that the actual data is a lot larger and names are often longer.

With the plot I want to show relationships because the axes. My first problem is:

I would like to sort the first axis ('disease' here) according to 'domain'. So I would like to have first all 'Phobia' because 'Anxiety' is the first domain, then 'ADHD' for 'Development' etc... (in the real data there would be more diseases per domain). They should correspond with the order in the legend.

Then, I would like to have smaller lines. Is this possible?

Finally, I wonder if I could set a minimum and maximum height for every box/strata (in the more complex data set they differ, e.g. 'ADHD' could be very small (nearly not readable) and 'Schizophrenia' very large)?

Thank you for any help!

Adn
  • 29
  • 4
  • 1
    You can solve your first issue by changing to `axis1 = fct_reorder(disease, as.numeric(factor(domain)))`. It's not clear what you mean by smaller lines. Do you mean thinner alluvial flows? If so, that's not how alluvial plots work; the thickness of the flows has to add up to the size of the strata - that's what the plot is trying to show. The same is also true of your last point. If the width of the flows is not important to you but just a way of linking categories graphically, perhaps you are not looking for an actual alluvial plot, but a bump chart or a graph? – Allan Cameron Aug 01 '23 at 11:59
  • Thanks a lot, @AllanCameron! The first one was already very helpful. But you are right that I was looking for thinner alluvial flows and adjusting the heights. The problem is: How could I make the plot not extremely large and readable otherwise? I would also be open for other ideas how to plot it, but bump does not seem to be good either because I want to show these overlaps (e.g. in persons, markers) and its not longitudinal data. – Adn Aug 01 '23 at 12:15

1 Answers1

1

It sounds like you are trying to show a graph of relationships between variables, and that the absolute values represented by the size of the flows are less important. Perhaps using tidygraph would get you the result you want?

library(tidygraph)
library(ggraph)

g <- setNames(data[c('domain', 'disease')], c('from', 'to')) |>
  rbind(setNames(data[c('disease', 'marker1')], c('from', 'to'))) |>
  rbind(setNames(data[c('marker1', 'marker2')], c('from', 'to'))) |>
  as_tbl_graph() %>%
  activate(nodes) %>%
  mutate(group = factor(group_components()))

ggraph(g) + 
  geom_edge_bend(width = 0.3, alpha = 0.5) +
  geom_node_label(aes(label = name), fill = 'white', 
                  color = 'white', hjust = 1) +
  geom_node_label(aes(label = name, fill = group), 
                  alpha = 0.2, hjust = 1) +
  coord_flip(clip = 'off') +
  annotate('text', rep(16, 4), 4:1, hjust = 1, size = 5, fontface = 2,
           label = c('Domain', 'Disease', 'Marker 1', 'Marker 2')) +
  scale_y_reverse() +
  theme_void() +
  theme(plot.margin = margin(50, 10, 50, 100)) +
  scale_fill_brewer(palette = 'Set1', guide = 'none')

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Wow, thank you so much, this is really cool and I did not know this before! I experience only one problem: Even when just copying your code, in my case all the colors are the same. When I look at the 'g' object, I also see that "Group" has value 1 for everything. I cannot figure out where the problem lies exactly, but it would be really good to have the colors corresponding to the 'domain'... Or even also the lines? Do you know if this would be possible/ how to fix it? Thank you so much! – Adn Aug 01 '23 at 13:37