I'm trying to build a Sankey diagram using the networkd3 package in r.
I think I've set up the dataset in the correct way, starting from a table. Here is my code:
M <- data.frame(as.matrix( table(as.character(df$q4.1),as.character(df$q4.2))))
M <- filter(M, Freq !=0)
q4.1 and q4.2 are two categorical variables with the same categories. I'm interesting in visualizing the flows going from the answers in q4.1 to q4.2.
nodes <- data.frame(
name=c(as.character(M$Var1), as.character(M$Var2)) %>% unique()
)
The problem is, that given that the two variables have the same names, when I create the nodes, it includes only one set of "names".
Therefore, if somebody selected the same option in both questions, it ends up having the same source and target (see example below)
M$IDsource <- match(M$Var1, nodes$name)-1
M$IDtarget <- match(M$Var2, nodes$name)-1
M
Var1 Var2 Freq IDsource IDtarget
No idea No idea 16 7 7
As you can imagine, the resulting graph is odd, as people providing the same answers to both questions are shown as a circle returning to the same source.
Is renaming the categories in the second question the only possiblity to solve the problem? Or what I'm doing wrong?
Thanks for the support!
P.S. I already used the ggalluvial package within the ggplot2 to create the graph I want. However, it is not so nice (and exportable as htmlwidget) as the plot you can get with the networkD3 package, so I would like to recreate the same graph with networkD3. Here the successful code I used with the ggalluvial package.
ggplot(data= M, aes(axis1= Var1, axis2= Var2, y= Freq)) + scale_x_discrete(limits = c("Next 6 months", "Next 12-18 Months"), expand=c(0.1, 0.05)) + geom_alluvium() + geom_stratum() + geom_text(stat="stratum", infer.label = TRUE)