I have a dataset of 3 years (2019-2021), which looks something like as following:
Date Topic Users
01/01/2019 News user_a
02/01/2019 Sports user_b
03/01/2019 Entertainment user_c
...
01/01/2020 Weather user_a
02/01/2020 News user_a
03/01/2020 Business user_c
...
01/01/2021 Sports user_b
02/01/2021 Business user_b
03/01/2021 News user_c
...
29/12/2021 Entertainment user_c
30/12/2021 News user_a
31/12/2021 Sports user_a
In summary, we have repeating dates, i.e., where multiple topics are mentioned for same date and multiple users are talking about one or more topic.
The idea behind creating a sankey plot for three years is to show the shift of users talking about those topics. For example, if certain users are talking about Sports in 2019, are they still talking about it in 2020 as well?
To create such plot, I've been trying to use the following code taken from here,
# Libraries
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(circlize)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'
# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)
although it misses the Year part, but I still haven't been able to figure out how to create source and target links.
To understand the desired output, I am pasting a modified figure below (originally taken from the output of above code/link):
Where the year is at the bottom, Topic instead of country names, and the flow indicating the number of users who are talking about certain topic yearly.
Any help please?