1

I have a dataset of 3 years (2019-2021), which looks something like as following:

      Date      Topic           Users
  01/01/2019     News           user_a
  02/01/2019     Sports         user_b
  03/01/2019     Entertainment  user_c
   ...
  01/01/2020     Weather        user_a
  02/01/2020     News           user_a
  03/01/2020     Business       user_c
   ...
  01/01/2021     Sports         user_b
  02/01/2021     Business       user_b
  03/01/2021     News           user_c
   ...
  29/12/2021     Entertainment  user_c
  30/12/2021     News           user_a
  31/12/2021     Sports         user_a

In summary, we have repeating dates, i.e., where multiple topics are mentioned for same date and multiple users are talking about one or more topic.

The idea behind creating a sankey plot for three years is to show the shift of users talking about those topics. For example, if certain users are talking about Sports in 2019, are they still talking about it in 2020 as well?

To create such plot, I've been trying to use the following code taken from here,

# Libraries
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(circlize)

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)

# I need a long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname) %>%
  filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
 
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1 
data_long$IDtarget=match(data_long$target, nodes$name)-1

# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'

# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
                     Source = "IDsource", Target = "IDtarget",
                     Value = "value", NodeID = "name", 
                     sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)

although it misses the Year part, but I still haven't been able to figure out how to create source and target links.

To understand the desired output, I am pasting a modified figure below (originally taken from the output of above code/link):

enter image description here

Where the year is at the bottom, Topic instead of country names, and the flow indicating the number of users who are talking about certain topic yearly.

Any help please?

AneesBaqir
  • 423
  • 1
  • 12
  • The values in a sankey measure the flow, not the individual items (or in this case, users), so a sankey is probably not the appropriate type of chart for the data you have. You would likely end up with nodes that have mismatched amount of flow coming in and going out of them. If you try to sketch out what you think this sankey flow should look like, it will probably become more clear to you. – CJ Yetman Jun 15 '22 at 15:19
  • @CJYetman , thank you for your response, I am trying to measure the flow based on users. The idea is to use the number of users as a value for flow, for instance, in figure above, lets say user_a is talking about Africa in 2019 and then he is talking about Europe in 2020. My apologies if I am not able to explain – AneesBaqir Jun 15 '22 at 15:22
  • Soemthing like the answer here https://stackoverflow.com/questions/46880502/networkd3-sankey-diagram-in-r-how-to-calculate-value-for-each-link?rq=1 , except that I am using ```Users``` instead of **clientID**. – AneesBaqir Jun 15 '22 at 15:39
  • Like I said, try sketching it out and you’ll see why it doesn’t add up. – CJ Yetman Jun 16 '22 at 07:39

0 Answers0