Transforming data with NAs in ggaluvial format and visualizing as a alluvial plot in R

Question

I tried multiple options with Python, but did not resolve it. Found ggaluvial package and need some help.

This is my fake data in such format. I calculated the frequency of each unique combination, which goes across five different roles.

My problem I have NAs in some combinations and they also can increase in numbers.

For instance, in Role 1 I can have 20 roles A, but when ppl move to Role 2, the total number of roles A is 30. Maybe it is a problem.

Role 1  Role 2  Role 3  Role 4  Role 5  freq
Role A   Role A  Role A  Role B  Role A 8
Role A   Role A  Role A  Role B NA  5
Role A   Role A  Role A  Role C NA  4
Role A   Role A  Role B  Role A  Role B 4
Role A   Role A  Role B  Role B  Role A 4
Role A   Role A NA  NA  NA  3
Role A   Role A  Role C  Role A NA  3

My ideal output is like this for 5 roles (like here 5 semesters) with color by Role A, B, C, D.

I also was thinking to restructure data like this Majors dataset example from package.

    Frequency   Role    Type
1   1           Role1   Role A
2   2           Role1   Role B
3   6           Role1   Role C
4   8           Role1   Role D
5   9           Role2   Role A
6   10          Role2   Role B
7   11          Role2   Role C
8   12          Role2   Role D
9   14          Role3   Role A
10  15          Role3   Role B
11  1           Role3   Role C
12  2           Role3   Role D

But it gives me

ggplot(df,
   aes(x = Role, stratum = Type, alluvium = Frequency, y = 1)) +
  geom_alluvium(fill = "darkgrey") +
  geom_stratum(aes(fill = Type)) +
  theme_bw()

The help here is super appreciated.

Restructuring data like this

I achieved a desired result with a non-programming tool. But still looking for the same ideal output in R.

Could you provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? This would begin with any necessary `library()` calls and proceed through the generation of the plot (using the code you already included). You would need [Edit: rather, I think it best practice] to define `df` from scratch, e.g. from the output of `dput(df)`. — Cory Brunson, Aug 30 '22 at 12:47

Transforming data with NAs in ggaluvial format and visualizing as a alluvial plot in R

0 Answers0

Linked