4

I am trying to build a sankey network. This is my data and code:

library(networkD3)
nodes <- data.frame(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "D", "E", "N", "O", "P", "Q", "R"))
names(nodes) <- "name"
nodes$name = as.character(nodes$name)
links <- data.frame(matrix( 
  c(0,  2,  318.167, 
0,  3,  73.85, 
0,  4,  51.1262,
0,  5,  6.83333,
0,  6,  5.68571,
0,  7,  27.4167,
0,  8,  4.16667,
0,  9,  27.7381,
1,  10, 627.015,
1,  3,  884.428,
1,  4,  364.211,
1,  13, 12.33333,
1,  14, 9,
1,  15, 37.2833,
1,  16, 9.6,
1,  17, 30.5485), nrow=16, ncol=3, byrow = TRUE))
colnames(links) <- c("source", "target", "value")
links$source = as.integer(links$source)
links$target = as.integer(links$target)
links$value = as.numeric(links$value)
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
          Target = "target", Value = "value", NodeID = "name",
          fontSize = 12, fontFamily = 'Arial', nodeWidth = 20)

The problem is that A and B only have common links to D and E. Although the links are correctly displayed, D and E are also shown at the right-bottom. How can I avoid this ? Note: If I specify

nodes <- data.frame(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "N", "O", "P", "Q", "R"))

no network at all is created.

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
Florian Seliger
  • 421
  • 4
  • 16

1 Answers1

2

Nodes must be unique, see below example. I removed repeated nodes: "D" and "E", then in links, I removed links that reference to nodes that do not exist. We have only 16 nodes, zero based 0:15. And in your links dataframe, you have last 2 rows referencing to 16 and 17.


Or as @CJYetman (networkD3 author) comments:

Another way to say it... every node that is in the nodes data frame will be plotted, even if it has the same name as another node, because the index is technically the unique id.

library(networkD3)

nodes <- data.frame(name = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "N", "O", "P", "Q", "R"), 
                    ix = 0:15)

links <- data.frame(matrix( 
  c(0,  2,  318.167, 
    0,  3,  73.85, 
    0,  4,  51.1262,
    0,  5,  6.83333,
    0,  6,  5.68571,
    0,  7,  27.4167,
    0,  8,  4.16667,
    0,  9,  27.7381,
    1,  10, 627.015,
    1,  3,  884.428,
    1,  4,  364.211,
    1,  13, 12.33333,
    1,  14, 9,
    1,  15, 37.2833), nrow=14, ncol=3, byrow = TRUE))
colnames(links) <- c("source", "target", "value")

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "name",
              fontSize = 12, fontFamily = 'Arial', nodeWidth = 20)

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • so there is no way of getting rid of the nodes in the network? – Florian Seliger Jul 25 '18 at 08:46
  • @FlorianSeliger Not sure what you mean. Could you clarify, what is the expected output? From my example, we can see additional D and E is removed. – zx8754 Jul 25 '18 at 08:48
  • the problem is that in your example links 1-16 and 1-17 are missing. I want to show all links from my data, but to get rid of nodes shown without any links (in my example D, E, in your example N, O). – Florian Seliger Jul 25 '18 at 09:28
  • 1
    @FlorianSeliger Because we don't have nodes 16 and 17. Then drop "N" and "O" from nodes definition. – zx8754 Jul 25 '18 at 09:32
  • 1
    @FlorianSeliger then change the last two links in your links data to 1->3 and 1->4 so that they point to the actual D and E nodes, not the duplicated ones that you remove from the nodes data frame – CJ Yetman Jul 25 '18 at 11:36