I have the following data frame, where each patient is a row (I am showing only a sample of it):
df = structure(list(firstY = c("N/A", "1", "3a", "3a", "3b", "1",
"2", "1", "5", "3b"), secondY = c("N/A", "1", "2", "3a", "4",
"1", "N/A", "1", "5", "3b"), ThirdY = c("N/A", "1", "N/A", "3b",
"4", "1", "N/A", "1", "N/A", "3b"), FourthY = c("N/A", "1", "N/A",
"3a", "4", "1", "N/A", "1", "N/A", "3a"), FifthY = c("N/A", "1",
"N/A", "2", "5", "1", "N/A", "N/A", "N/A", "3b")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
I would like to plot a Sankey diagram, which shows the trajectory over time of each patient, and I know that I have to create nodes and links, but I'm having problems transforming the data to the format necessary to accomplish this. Specifically, the most problematic issue is to count how many patients belong to each trajectory, for example, how many patients went in the first year from stage 1 to 2, and all other combinations.
Any help with the data preparation would be appreciated.
The package Alluvial, although simple to understand, does not cope really well in case there is a lot of data.