0

I wrote a recursive function in R for finding all paths s-t paths of a directed graph (no cycles). I used this page as my model: All possible paths from one node to another in a directed tree (igraph) and it outputs the correct result, but it's slow. With small graphs, no big deal. With large graphs, it's an issue.

I'm new to R but have read that it performs significantly better when avoiding loops and using vectorization. I'm trying to wrap my head around it, and am hoping you might assist. My code:

findAllPaths <- function(graph,start,end) {
  return(fastFindPaths(graph, start, end))
}   


fastFindPaths <- function(graph, from, to, path) {
  if(missing(path)) path <- c()
  path <- cbind(path, from)
  if (from == to) return(path)
  paths <- c()
  adjList <- get.adjlist(graph, mode="out")[[from]]
  for (child in adjList) {
    if (!child %in% path) {
      childPaths <- fastFindPaths(graph, child, to, path)
      for (childPath in childPaths) paths <- c(paths, childPath)
    }
  }
  return(paths)
}

So, is this a candidate for vectorization? How can I speed this up? Any other tips you'd give someone learning R?

Thanks!

Community
  • 1
  • 1
shl
  • 1
  • Have you seen the `get.shortest.paths()` function in the `igraph` package? This looks like it does what you're after, so you could either use it or look at its source code for inspiration. To me, your problem doesn't look like it will vectorise easily. – Miff May 21 '14 at 08:54
  • One possible bottleneck is the `get.adjlist()` call. Basically, for every invocation of `fastFindPaths`, you construct the *entire* adjacency list of the graph only to get the neighbors of a single node (that is, the neighbors of `from`). This can be done much efficiently using the `neighbors` function instead. – Tamás May 21 '14 at 11:13
  • Another bottleneck is the `paths <- c(paths, childPath)` call; I'm not that familiar with R but I strongly suspect that this will be quadratic since it allocates a new copy of `paths`, appends `childPath` to it and then discards the old `paths` vector. Can you simply pre-allocate a large vector with NA values and then start filling it (and grow the vector if it becomes full)? – Tamás May 21 '14 at 11:14
  • @Miff, the `get.shortest.paths()` function only returns the shortest path. I need *all* paths. Same for `get.all.shortest.paths()`. – shl May 21 '14 at 16:36
  • @Tamás your first suggestion is a good one. I changed from `get.adjlist()` to `neighborhood()` and saw a good speed increase. For a graph with 22 nodes and 107 edges the time for all paths dropped from ~26s to ~16s. I'll try your second suggestion. – shl May 21 '14 at 16:39
  • `neighborhood()` is still suboptimal because it includes the query node in the result. Use `neighbors()` instead. – Tamás May 21 '14 at 21:34
  • @Tamás ahh yes, much better. I was having to explictly deal with the self-reference from `neighborhood()`. – shl May 23 '14 at 17:37

2 Answers2

0

The development version of igraph has a get.all.simple.paths() function, you can get it from here: http://igraph.org/nightly.

Gabor Csardi
  • 10,705
  • 1
  • 36
  • 53
  • I used Tamas's suggestion to not call `get.adjlist()` but instead use `neighbors()` and that provided a decent speed increase. What really helped improve the performance, however, was parallelizing the search, by calling `fastFindPaths()` on each of the `start` node's children and aggregating the results. I'm currently on a Windows machine, so used the `clusterApply()` function. – shl May 23 '14 at 17:42
0

I used Tamas's suggestion to not call get.adjlist() but instead use neighbors() and that provided a decent speed increase. What really helped improve the performance, however, was parallelizing the search, by calling fastFindPaths() on each of the start node's children and aggregating the results. I'm currently on a Windows machine, so used the clusterApply() function. Though I will likely generalize the code to check the Sys.info()[1] and use mclapply() if not Windows.

shl
  • 1