I tried multiple options with Python, but did not resolve it. Found ggaluvial package and need some help.
This is my fake data in such format. I calculated the frequency of each unique combination, which goes across five different roles.
My problem I have NAs in some combinations and they also can increase in numbers.
For instance, in Role 1 I can have 20 roles A, but when ppl move to Role 2, the total number of roles A is 30. Maybe it is a problem.
Role 1 Role 2 Role 3 Role 4 Role 5 freq
Role A Role A Role A Role B Role A 8
Role A Role A Role A Role B NA 5
Role A Role A Role A Role C NA 4
Role A Role A Role B Role A Role B 4
Role A Role A Role B Role B Role A 4
Role A Role A NA NA NA 3
Role A Role A Role C Role A NA 3
My ideal output is like this for 5 roles (like here 5 semesters) with color by Role A, B, C, D.
I also was thinking to restructure data like this Majors
dataset example from package.
Frequency Role Type
1 1 Role1 Role A
2 2 Role1 Role B
3 6 Role1 Role C
4 8 Role1 Role D
5 9 Role2 Role A
6 10 Role2 Role B
7 11 Role2 Role C
8 12 Role2 Role D
9 14 Role3 Role A
10 15 Role3 Role B
11 1 Role3 Role C
12 2 Role3 Role D
But it gives me
ggplot(df,
aes(x = Role, stratum = Type, alluvium = Frequency, y = 1)) +
geom_alluvium(fill = "darkgrey") +
geom_stratum(aes(fill = Type)) +
theme_bw()
The help here is super appreciated.
Restructuring data like this
I achieved a desired result with a non-programming tool. But still looking for the same ideal output in R.