2

Thanks for any help you can provide.

I am using the networkd3 package in R to plot a forceNetwork plot of a nodelist and links (edgelist).

I have an edgelist / link list:

> edgelist

      round_prob NODEAid NODEBid
33979     0.6245    6990    6588
4899      0.9797    1042    1041
37109     0.6046    7498    7531
27771     0.7144    5906   16029
3603      0.6452     783     804
28491     0.6078    6034    5862
4518      0.6245     962    9874
19613     0.6745    4121   10285
19916     0.8721    4179    4180
8249      0.6821    1737    1733
35389     0.7150    7145   16992
32010     0.6495    6728   16921
22553     0.6959    4722    4549
14996     0.6031    3273   12929
35927     0.6245    7221    9814
15349     0.6245    3337    3233
34833     0.6109    7085    6852
39044     0.6117    7936    7977
39075     0.6844    7944   10978
11691     0.6821    2572    2587

This is a sample of a much larger edgelist, where I have selected only those links with link probability >0.6 and <1. The full edgelist was zero-indexed before the sample was taken.

I also have a nodelist, that is 18000 rows long. A sample of it is this:

> head(nodes)

  node id gr
0 1097  0  1
1 1149  1  1
2 1150  2  1
3 3395  3  1
4 3396  4  1
5 3523  5  1

I try to plot using forceNetwork:

forceNetwork(Links = edgelist, Nodes = nodes, Source = "NODEAid",
             Target = "NODEBid", Value = "round_prob", NodeID = "node",
             Group = "gr", opacity = 0.9)

This gives this plot, before zooming in:

enter image description here

Problem: I only have 20 pairs of nodes, yet my plot has thousands more (I cannot return the number).

By hovering over the unconnected points, I have been able to identify that they are made up of all possible nodes that feature in the nodelist.

Basically I think that forceNetwork is plotting every possible node, even those not in the edgelist.

Why is this happening and how can I stop it from doing so?


As per this question Going crazy with forceNetwork in R: no edges displayed I made sure that all my data was in numeric format and zero indexed. I still get this error.

Note: If I run the forceNetwork example in this question How to plot a directed Graph in R with networkD3? and from this tutorial https://christophergandrud.github.io/networkD3/ the output is as expected.

Community
  • 1
  • 1
Chuck
  • 3,664
  • 7
  • 42
  • 76

2 Answers2

2

I would think you should subset the node list such that it only includes nodes that are in the edge list.

  • Hi Christopher. My sincere thanks for your response. So I tried that and it didn't work. However, what I didn't do was reset the index for either the nodelist of edgelist after (i.e. I have some subset of both, that has the original ids `1097 1149 ...` , I need to "reset those ids again from 0 perhaps. I will reimplement with this and get back to you. My sincerest thanks to you again. – Chuck May 12 '17 at 14:57
  • Hi Christopher. Thanks to @CJYetman I was able to get this to work. I think that, had he not answered, your suggestion would have done quite well too. Thanks so much for your response, and for writing this package. Have a nice day :) – Chuck May 15 '17 at 07:02
2

I would suggest either using simpleNetwork, which automatically creates the node list based on the edge list you pass, or use similar code as simpleNetwork does to create your node list first and then pass that to forceNetwork...

edgelist <- read.table(header = T, text = "
round_prob NODEAid NODEBid
33979     0.6245    6990    6588
4899      0.9797    1042    1041
37109     0.6046    7498    7531
27771     0.7144    5906   16029
3603      0.6452     783     804
28491     0.6078    6034    5862
4518      0.6245     962    9874
19613     0.6745    4121   10285
19916     0.8721    4179    4180
8249      0.6821    1737    1733
35389     0.7150    7145   16992
32010     0.6495    6728   16921
22553     0.6959    4722    4549
14996     0.6031    3273   12929
35927     0.6245    7221    9814
15349     0.6245    3337    3233
34833     0.6109    7085    6852
39044     0.6117    7936    7977
39075     0.6844    7944   10978
11691     0.6821    2572    2587
")

library(networkD3)

simpleNetwork(edgelist, Source = 'NODEAid', Target = 'NODEBid')

sources <- edgelist$NODEAid
targets <- edgelist$NODEBid
node_names <- factor(sort(unique(c(as.character(sources), 
                                   as.character(targets)))))
nodes <- data.frame(name = node_names, group = 1, size = 8)
links <- data.frame(source = match(sources, node_names) - 1, 
                target = match(targets, node_names) - 1, 
                value = edgelist$round_prob)

forceNetwork(Links = links, Nodes = nodes, Source = "source",
             Target = "target", Value = "value", NodeID = "name",
             Group = "group", opacity = 0.9)
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
  • 1
    he code (after simpleNetwork) does what is needed . And you are correct that the solution would be to create your node list first based on the edge list you pass, and then pass both then to `forceNetwork` . Just one niggle. On `simpleNetwork(edgelist) ` becase the df is not in the order expected it expects it shows the graph a bit wrong. – mamonu May 13 '17 at 15:48
  • 1
    I modified the `simpleNetwork` command in the code above... you can explicitly set the column names for source and target. – CJ Yetman May 13 '17 at 17:50
  • @CJYetman Worked like a charm - many thanks for your help :) So the index of `nodes` (which, having first ordered the original 4 digit node ids, is reset to zero) is used to encode the `source` and `target` ids. So the values in `source` in `target` should match the reset id on` nodes`, *not* (in this case) the original 4 digit ids leftover from the sampling of the original data? – Chuck May 15 '17 at 07:00
  • 1
    The values in source and target in your links data frame should match the row/index of the node they refer to in the nodes data frame... but since it's getting passed to JavaScript, JS views the first 'row/index' as 0, not 1 like R. – CJ Yetman May 15 '17 at 07:31
  • @CJYetman Hi again. Just a follow up question for you - when you do the conversion, in `links` your `value` defaults to `1`. How can I make this equal to the original link probability `round_prob` in `edgelist`? Thanks – Chuck May 16 '17 at 12:39
  • I modified the above code to set the `Value` column to the `round_prob` values. Those are really small values though, so you may want to transform them in some way. – CJ Yetman May 16 '17 at 13:04
  • How can we get the network when we have only edge list and NO node list, something like an email list, social network or something where cluster behavior is observed ? – Sitz Blogz Nov 23 '17 at 20:30
  • That’s what the example shows... build the node list based on your edges. – CJ Yetman Nov 23 '17 at 23:29