3

In GraphX, is there a way to retrieve all the nodes and edges that are on a path that are of a certain length?

More specifically, I would like to get all the 10-step paths from A to B. For each path, I would like to get the list of nodes and edges.

Thanks.

Mihai Chelaru
  • 7,614
  • 14
  • 45
  • 51
Inbal
  • 281
  • 2
  • 13
  • Can you give a more concrete example? It's not clear what you are asking. For example, are you trying to find all the nodes that are part of a 5-node pathway? Part of a specific pathway? Best would be if you gave some sample data and the results you would like to see based on that data. – David Griffin May 24 '16 at 18:46
  • There's no API to do this, and it's not a trivial thing. To solve it, you need to compute all the many possible "routes" through the graph. There are APIs like `aggregateMessages` and/or `pregel` that will allow you to build the logic, but like I said -- not a trivial thing. – David Griffin May 25 '16 at 14:39
  • Which environment will fulfill my needs? What about Gremlin over Titan over Apache Spark? Can using Gremlin fulfill my requirements? – Inbal May 26 '16 at 09:57
  • You can do it with Spark -- I'm just saying it's non-trivial. I'm sure it's non-trivial in those other environments too. Think about it this way, if you have 100 nodes, where each node is connected to 10 other nodes, you have could have 300 million+ pathways of length 10. (10 factorial is 3.6 million, and then multiply that by the number of nodes -- actual number would depend on the topography, but you get the idea.) – David Griffin May 26 '16 at 11:05

1 Answers1

4

Disclaimer: This is only intended to show GraphFrames path filtering capabilities.

Well, theoretically speaking it is possible. You can use GraphFrames patterns to find paths. Lets assume your data looks as follows:

import org.graphframes.GraphFrame

val nodes = "abcdefghij".map(c =>Tuple1(c.toString)).toDF("id")

val edges = Seq(
   // Long path
  ("a", "b"), ("b", "c"), ("c", "d"),  ("d", "e"), ("e", "f"),
  // and some random nodes
  ("g", "h"), ("i", "j"), ("j", "i")
).toDF("src", "dst")

val gf = GraphFrame(nodes, edges)

and you want to find all paths with at least 5 nodes.

You can construct following path pattern:

val path = (1 to 4).map(i => s"(n$i)-[e$i]->(n${i + 1})").mkString(";")
// (n1)-[e1]->(n2);(n2)-[e2]->(n3);(n3)-[e3]->(n4);(n4)-[e4]->(n5)

and filter expression to avoid cycles:

val expr = (1 to 5).map(i => s"n$i").combinations(2).map {
  case Seq(i, j) => col(i) !== col(j)
}.reduce(_ && _)

Finally quick check:

gf.find(path).where(expr).show
// +-----+---+---+-----+---+-----+---+-----+---+
// |   e1| n1| n2|   e2| n3|   e3| n4|   e4| n5|
// +-----+---+---+-----+---+-----+---+-----+---+
// |[a,b]|[a]|[b]|[b,c]|[c]|[c,d]|[d]|[d,e]|[e]|
// |[b,c]|[b]|[c]|[c,d]|[d]|[d,e]|[e]|[e,f]|[f]|
// +-----+---+---+-----+---+-----+---+-----+---+
zero323
  • 322,348
  • 103
  • 959
  • 935