4

I have a directed graph G in Spark GraphX (Scala). I would like to find the number of edges that should be crossed starting from a known vertex v1 to arrive in another vertex v2. In other words, I need the shortest path from the vertex v1 to the vertex v2 calculated in number of edges (not using the edges' weights).

I was looking at the GraphX documentation, but I wasn't able to find a method to do it. This is also needed in order to compute the depth of the graph if it has a tree structure. Is their an easy way to do this?

mt88
  • 2,855
  • 8
  • 24
  • 42
  • You mean the number of edges between v1 and v2 or the number of possible paths from v1 to v2, passing by 0 or more additional vertices? – Daniel de Paula May 10 '16 at 19:04
  • Hey, I meant number of edges between v1 and v2 from its shortest path. Updated to make more clear. – mt88 May 10 '16 at 19:45
  • Sorry, I still don't understand. You want to find the shortest path between v1 and v2? When you say edges between v1 and v2, I think about the edges that directly connect v1 and v2, and not the shortest path. – Daniel de Paula May 10 '16 at 23:07
  • What I mean is: In my graph G, For vectors v1 and v2, I want to find the minimum number of edges i have to cross to connect v1 and v2. Does that make more sense? G is a directed graph. – mt88 May 10 '16 at 23:09
  • Ok, so you want to solve the [Shortest Path Problem](https://en.wikipedia.org/wiki/Shortest_path_problem)? I will make a suggestion as an answer. – Daniel de Paula May 10 '16 at 23:11
  • Yes that sounds right. Assuming the edge weights are 1. – mt88 May 10 '16 at 23:21

1 Answers1

8

To find the shortest path between vertices using Spark GraphX, there is the ShortestPaths object, which is member of the org.apache.spark.graphx.lib.

Assuming you have a GraphX graph in g and you want to find the shortest path between the vertices with ids v1 and v2, you can do the following:

import org.apache.spark.graphx._
import org.apache.spark.graphx.lib.ShortestPaths

val result = ShortestPaths.run(g, Seq(v2))

val shortestPath = result               // result is a graph
  .vertices                             // we get the vertices RDD
  .filter({case(vId, _) => vId == v1})  // we filter to get only the shortest path from v1
  .first                                // there's only one value
  ._2                                   // the result is a tuple (v1, Map)
  .get(v2)                              // we get its shortest path to v2 as an Option object

The ShortestPaths GraphX algorithm returns a graph where the vertices RDD contains tuples in the format (vertexId, Map(target -> shortestPath). This graph will contain all vertices of the original graph, and their shortest paths to all target vertices passed in the Seq argument of the algorithm.

In your case, you want the shortest path between two specific vertices, so in the code above I show how to call the algorithm with only one target (v2), and then I filter the result to get only the shortest path starting from the desired vertex (v1).

Daniel de Paula
  • 17,362
  • 9
  • 71
  • 72
  • Hey. This worked great for individual pairs, but i'm getting a null pointer error when doing it for a list of pairs. I documented above. Do you happen to know what might be causing it? – mt88 May 11 '16 at 23:36
  • @mt88 maybe the object G has some problem? Have you tried printing some of G elements? – Daniel de Paula May 12 '16 at 00:15
  • 1
    Don't want to bury this question in my particular example, so I deleted my update. I posted a new question and reproducible example with that problem here: http://stackoverflow.com/questions/37175738/spark-scala-graphx-calling-shortest-path-within-a-map-function – mt88 May 12 '16 at 01:22
  • 3
    Daniel, am I correct in saying this only works for unweighted graphs? As in, all weights are taken as 1. – LearningSlowly Oct 24 '16 at 09:13
  • It's worth nothing that this gives the *length* of the shortest path, not the actual path. – Michael Mior Jun 13 '18 at 15:44
  • @MichaelMior Maybe the question text is ambiguous, but the OP wanted the "minimum number of edges between two vertices", so this answer is worth something, at least for him. – Daniel de Paula Jun 14 '18 at 07:59
  • For sure :) I just thought I would point that out in case someone came here looking for the path and got confused. – Michael Mior Jun 14 '18 at 12:02