0

I am working with a large network with thousands of nodes and edges to consider. A reprex of the network can be found in a previous question here Number of Connected Nodes in a dendrogram

However, when calculating the number of nodes within the network, I ran into a problem when trying to calculate the number of nodes that add together to lead to the next level up. For example,

library(tidygraph)
library(ggraph)
library(tidyverse)

parent_child <- tribble(
  ~parent, ~child,
  "a", "b",
  "b", "c",
  "b", "d",
  "d", "e",
  "d", "f",
  "d", "g",
  "g", "z"
)

# converted to a dendrogram ------------

parent_child %>%
  as_tbl_graph() %>% 
  ggraph(layout = "dendrogram") +
  geom_node_point() +
  geom_node_text(aes(label = name),
                 vjust = -1,
                 hjust = -1) +
  geom_edge_elbow()

# Table of calculations ----------------------

parent_child %>% 
  as_tbl_graph() %>% 
  activate(nodes) %>% 
  mutate(n_community_out = local_size(order = graph_size(),
                                      mode = "out",
                                      mindist = 0)) %>% 
  as_tibble()

# Final Output Table -----------------------
# A tibble: 8 x 2
  name  n_community_out
  <chr>           <dbl>
1 a                   8
2 b                   7
3 d                   5
4 g                   2
5 c                   1
6 e                   1
7 f                   1
8 z                   1

The table above shows the number of connected nodes out from a starting node. However, why do certain levels not add up to the next level? (node d + c != node b) I've been trying to explain this to colleagues, but cannot adequately explain what the network is counting and why adding up the node connections from on position to the next does not lead to the next higher level.

This problem is exacerbated within a network with thousands of nodes, and is difficult to display. Anyway, does anyone know how to explain why nodes connections do not add up to the next level? Any help is greatly appreciated.

James Crumpler
  • 192
  • 1
  • 8
  • I hope my solution below has solved your issue. Please accept if it has proved useful to you :) Otherwise, happy to clarify. – Eric Leung Dec 10 '20 at 01:01

1 Answers1

0

You're having a one-off-by error. When comparing the number of nodes it's connected to, you need to subtract one because of how you're counting by including nodes themselves in the connected nodes count.

For your example of

Node D + Node E ?= Node B

your table gives the values

...
2 b                   7
3 d                   5
...
5 c                   1
...

You've intentionally set mindist = 0 so that when counting nodes from a parent, you include that node itself.

Here's a quick visual to see the directionality.

library(tidygraph)
library(ggraph)
library(tidyverse)

parent_child <- tribble(
  ~parent, ~child,
  "a", "b",
  "b", "c",
  "b", "d",
  "d", "e",
  "d", "f",
  "d", "g",
  "g", "z"
)

plot(as_tbl_graph(parent_child))

Created on 2020-11-25 by the reprex package (v0.3.0)

Node C can't point to anything else, but because of mindist = 0, it will count itself and have its community equal 1 like it is in your table.

Node D can visit 4 nodes (e, f, g, z) and when we count itself, its local neighborhood is a total of 5 nodes.

Similarly, Node B will count all the nodes it's connected to, but also count itself.

So to get the actual counts to compare, you'll need to subtract one.

Node D + Node E
=> 5 + 1
=> 6

Node B = 7
=> 7 - 1
=> 6
Eric Leung
  • 2,354
  • 12
  • 25
  • Thank you for the help in resolving the issue. Also discovered that there were multiple problems with the actual data I was using. When the network is not disjointed, it throws off the numbers as well, especially for a large network where it is difficult to see every node that was joins together. Your solution described part of the problem I was experiencing....I still need to figure out how to sub-graph a non-disjointed network into disjointed networks :) – James Crumpler Dec 10 '20 at 12:45