I am looking through the documentation and tutorials for building a sankey plot using networkd3 in r via networkD3::sankeyNetwork()
.
I can get this working using someone else's code (from here: sankey diagram in R - data preparation - see a tidyverse way with networkd3 by CJ Yetman)
When I try and implement this myself my nodes get placed in the wrong order on the x-axis - rendering the flow impossible to understand.
However I cannot work out where the sankeyNetwork
is getting information about the x-axis location.
Here is my implementation that does not yield the desired result:
library(tidyverse)
library(networkD3)
#Create the data
df <- data.frame('one' = c('a', 'b', 'b', 'a'),
'two' = c('c', 'd', 'e', 'c'),
'three' = c('f', 'g', 'f', 'f'))
#My code
#Create the links
links <- df %>%
mutate(row = row_number()) %>% #Get row for grouping and pivoting
pivot_longer(-row) %>% #pivot to long format
group_by(row) %>%
mutate(source_c = lead(value)) %>% #Get flow
filter(!is.na(source_c)) %>% #Get rid of NA
rename(target_c = value) %>% #Correct names
group_by(target_c, source_c) %>% #Count frequencies
summarize(value = n()) %>%
ungroup() %>%
mutate(target = as.integer(factor(target_c)), #Convert to numeric values
source = as.integer(factor(source_c))) %>%
mutate(source = source - 1, #zero index
target = target - 1) %>%
data.frame()
#create the nodes
nodes <- data.frame(name = factor(unique(c(links$target_c, links$source_c))))
#plot the network
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
Target = 'target', Value = 'value', NodeID = 'name')
Yields:
Using the working code from the linked answer:
links <-
df %>%
mutate(row = row_number()) %>% # add a row id
gather('col', 'source', -row) %>% # gather all columns
mutate(col = match(col, names(df))) %>% # convert col names to col nums
mutate(source = paste0(source, '_', col)) %>% # add col num to node names
group_by(row) %>%
arrange(col) %>%
mutate(target = lead(source)) %>% # get target from following node in row
ungroup() %>%
filter(!is.na(target)) %>% # remove links from last column in original data
select(source, target) %>%
group_by(source, target) %>%
summarise(value = n()) # aggregate and count similar links
# create nodes data frame from unque nodes found in links data frame
nodes <- data.frame(id = unique(c(links$source, links$target)),
stringsAsFactors = FALSE)
# remove column id from names
nodes$name <- sub('_[0-9]*$', '', nodes$id)
# set links data to the 0-based index of the nodes in the nodes data frame
links$source <- match(links$source, nodes$id) - 1
links$target <- match(links$target, nodes$id) - 1
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
Target = 'target', Value = 'value', NodeID = 'name')
I appreciate that the working code and my code are different, but I can't see where the rownumber (i.e. the x-axis) data is getting called by the sankeyNetwork - there is no call to any variable that contains that information. I think I can make my own code work to prep the data, once i know what it needs to look like.