0

I am using sankeyNetwork() from the networkD3 package for visualizing some data. I was wondering if theres a way to "isolate" a branch from start to finish, ignoring the irrelevant links.

Example: I've got this: SankeyGot

And I want to extract this: SankeyWant

reproducible example:

set.seed(9)

df <- tibble(
  source = sample(stringr::words, 5) %>% rep(2),
  target = c(sample(words, 7), source[1:3]), 
  values = rnorm(10, 10, 7) %>% round(0) %>% abs)

nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
  )

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

I'd like to be able to filter out "name" for example and get everything that connects to that on all levels upstream and downstream - how would i go about doing this?

Christian
  • 23
  • 4
  • What happens if you filter your input tibble `links` as a pre-step to `sankeyNetwork()`? This would be a workaround, as you would run the diagram, then read it manually to see what to filter, and run the diagram again on a filtered dataframe – psychonomics Mar 14 '22 at 16:40
  • @psychonomics the links would need to be filtered, but in some smart way - please see my comment to your submitted answer! – Christian Mar 15 '22 at 08:17

2 Answers2

2

Calculating the paths from a node in a graph is non-trivial, but the igraph package can help with the all_simple_paths(). However, heed that warning in the help file...

Note that potentially there are exponentially many paths between two vertices of a graph, and you may run out of memory when using this function, if your graph is lattice-like.

(I don't know what your words vector is, so I recreated the links data.frame manually)

library(dplyr)
library(networkD3)

set.seed(9)

df <- read.csv(header = TRUE, text = "
source,target
summer,obvious
summer,structure
however,either
however,match
obvious,about
obvious,non
either,contract
either,produce
contract,paint
contract,name
")
df$values <- rnorm(10, 10, 7) %>% round(0) %>% abs()


# use graph to calculate the paths from a node
library(igraph)

graph <- graph_from_data_frame(df)

start_node <- "name"

# get nodes along a uni-directional path going IN to the start_node
connected_nodes_in <- 
  all_simple_paths(graph, from = start_node, mode = "in") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# get nodes along a uni-directional path going OUT of the start_node
connected_nodes_out <- 
  all_simple_paths(graph, from = start_node, mode = "out") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# combine them
connected_nodes <- unique(c(connected_nodes_in, connected_nodes_out))

# filter your data frame so it only includes links/edges that start and
# end at connected nodes
df <- df %>% filter(source %in% connected_nodes & target %in% connected_nodes)



nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
)

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

enter image description here

CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
  • 1
    "words" are is a character vector from stringr containing 980 words. I used those instead of just letters to act as node names. – Christian Mar 15 '22 at 11:54
  • Thanks for clarifying. If you add the library calls for the packages you’re using, it will make your example code more reproducible. – CJ Yetman Mar 15 '22 at 14:21
0

If you code sankeyNetwork as an object you can use str(object) to identify it as a list, with a matrix called x that holds your input df

list_sankey <- sankeyNetwork(Links = links, Nodes = nodes, Source = "source", Target = "target", Value = "value", NodeID = "names", iterations = 64, sinksRight = F, fontSize = 14)

str(list_sankey)

You can then filter the x matrix to only contrain your desired input source and output target nodes

list_sankey_filter <- list_sankey

list_sankey_filter$x$links <- list_sankey_filter$x$links %>% filter(source %in% c(4, 2, 0), target %in% c(4, 2, 0, 10))

This then gives you the object below.

enter image description here

psychonomics
  • 714
  • 4
  • 12
  • 26
  • Solves part of the problem, but in reality my dataset is much bigger and I'd like to use name a node ("name" for example) and then having the code figure out recursively which sources contribute to the node instead of specifying them manually. Ideally, I guess the statement ´filter(source %in% c(4, 2, 0), target %in% c(4, 2, 0, 10))` would need to be replaced with something smart? – Christian Mar 15 '22 at 08:13
  • 1
    @Christian - hopefully the answer by CJ Yetman helps. It looks more smart, and preserves the size ratio between name and contract (which my hack did not). – psychonomics Mar 15 '22 at 10:06