I have a network of more than 4,000 nodes and I have a list of edges (connections between pairs of nodes). All nodes should converge to a single central point, but I have no way of ordering the nodes, since they are not numbered or labeled in a way that it is feasible to reorder them.
What I need?: Based on the small example attached, I need all the nodes point to node F (F is reachable from all nodes), so that the undirected graph becomes a directed graph (DAG) and that as a restriction there is only a single edge between each node pair. I am allowed to remove edges if and only if it is to remove loops (eg A -> B, B <- A). I can't add edges either, as this is a real network and I can't create connections where they don't exist.
What I have is this:
library(igraph)
library(tidygraph)
library(ggraph)
library(tidyverse)
# edge list
edgelist <- tribble(
~from, ~to,
"A", "B",
"A", "C",
"B", "D",
"C", "D",
"C", "E",
"D", "E",
"D", "F")
# create the graph
g <- as_tbl_graph(edgelist)
# undirected graph
g %>%
ggraph(layout = "graphopt") +
geom_edge_link() +
geom_node_point(shape = 21, size = 18, fill = 'white') +
geom_node_text(aes(label = name), size = 3) +
theme_graph()
This is the procedure I came up with to do the sorting so that the list of edges would become a DAG:
s <- names(V(g))
# define node objective
target <- "F"
# exclude target from vertex list
vertex_list <- s[s != target]
# calculate the simple path of each node to the destination node (target)
route_list <- map(vertex_list, ~ all_simple_paths(graph = g,
from = .x,
to = target)) %>%
set_names(vertex_list) %>%
map(~ map(., ~ names(.x))) %>%
flatten() %>%
map(~ str_c(.x, collapse = ","))
# generate the list of ordered edges
ordered_edges <- do.call(rbind, route_list) %>%
as.data.frame(row.names = F) %>%
set_names("chain") %>%
group_by(chain) %>%
summarise(destination = str_split(chain, ","), .groups = "drop") %>%
mutate(
from = map(destination, ~ lag(.x)) %>%
map(~ .x[!is.na(.x)]),
to = map(destination, ~ lead(.x)) %>%
map(~ .x[!is.na(.x)]),
) %>%
select(from, to) %>%
unnest(cols = everything()) %>%
group_by(across(everything())) %>%
summarise(enlaces = n(), .groups = "drop") %>%
select(-enlaces)
Warning: When the number of nodes is of a certain size (let's say 90), this algorithm generates loops that make the graph non-acyclic, so an additional procedure I do is apply a function in Python called feedback_arc_set
to remove the edges that will make the graph a DAG.
For simplicity, I am not including the necessary code to remove these loops, since in this specific example no loops are generated.
# draw the graph again
as_tbl_graph(ordered_edges) %>%
ggraph(layout = "graphopt") +
geom_edge_link(arrow = arrow(length = unit(3, 'mm'),
type = "closed",
angle = 30),
end_cap = circle(7, 'mm')) +
geom_node_point(shape = 21, size = 18, fill = 'white') +
geom_node_text(aes(label = name), size = 3) +
theme_graph()
Created on 2021-07-07 by the reprex package (v2.0.0)
So what is the problem?: The complexity of the algorithm when the number of nodes is greater than 2000
If I try to do this with say 2000 nodes the algorithm never ends. I left it running for 24 hours and it didn't finish. In fact, I didn't find a way to know if it was working. In this place I found that the function of {igraph} all_simple_paths
uses DFS behind the scenes, but complexity is O (|V|!) where |V| is the number of vertices and |V|! is the factorial of the number of vertices.
Is there a way to do this with less complexity?