0

I have a particular structured data set on student performance across classes and into which achievement cohort they fell into. I want to create a sankey diagram that visualizes how students achievement cohorts changed across several classes. My data looks like this:

Course     St_ID    Achievement
Eng101     St_A     Top third
Eng101     St_B     Top third
Eng101     St_C     Middle third
Eng101     St_D     Middle third    
Eng101     St_E     Bottom third
Eng101     St_F     Bottom third
Calc101    St_A     Top third
Calc101    St_B     Bottom third
Calc101    St_C     Bottom third
Calc101    St_D     Top third
Calc101    St_E     Middle third
Calc101    St_F     Middle third
Hist101    St_A     Bottom third
Hist101    St_B     Bottom third
Hist101    St_C     Middle third
Hist101    St_D     Top third
Hist101    St_E     Middle third
Hist101    St_F     Top third

And I want the sankey diagram to look something like this (not drawn to scale)enter image description here:

How can I do that?

Alokin
  • 461
  • 1
  • 4
  • 22

1 Answers1

1

Here's a way to create this type of plot with ggalluvial

library(ggalluvial)

ggplot(df,
       aes(x = Course,
           label = Achievement,
           stratum = Achievement,
           alluvium = St_ID,
           fill = Achievement)) +
  geom_flow(stat = 'alluvium',
            lode.guidance = 'frontback') +
  geom_stratum()

Created on 2023-06-27 with reprex v2.0.2

Seth
  • 1,659
  • 1
  • 4
  • 11
  • This is beautiful. Thank you. Would you know how to arrange the Achievement brackets, so that the "Top third" is on top, and the "Bottom third" is on bottom? – Alokin Jun 27 '23 at 16:01
  • 1
    Hi @Alokin - One way you can flip the order is by adding `reverse = FALSE` to the stratum, ie `geom_stratum(reverse = FALSE)`. – Seth Jun 27 '23 at 16:13
  • I see. How would you go with creating a custom order for the stratum? An order that might be more then 3 things in length and you couldn't reverse it? – Alokin Jun 27 '23 at 16:19
  • 1
    This [vignette](https://cran.r-project.org/web/packages/ggalluvial/vignettes/order-rectangles.html) gives great examples of how to control ordering of the various components. If you strictly want to play with the order of the `Achievement` categories in the strata, you could manually order the factor levels in the data frame: `factor(df$Achievement, levels = ...)` – Seth Jun 27 '23 at 16:33