2

I was testing graphframes BFS toy example:

val g: GraphFrame = examples.Graphs.friends
val paths: DataFrame = g.bfs.fromExpr("name = 'Esther'").toExpr("name <> 'Esther'").run()

The result I get is:

+-------------+------------+------------+
|         from|          e0|          to|
+-------------+------------+------------+
|[e,Esther,32]|[e,f,follow]|[f,Fanny,36]|
|[e,Esther,32]|[e,d,friend]|[d,David,29]|
+-------------+------------+------------+

That's pretty weird, since Fanny and David also have outgoing edges. And the vertices linked to them also have outgoing edges, e.g, the result dataframe should contain not only one hop paths, but all paths from the source vertex.

I myself created a toy graph:

1 2
2 3
3 4
4 5

And when I do the same kind of query:

g.bfs.fromExpr("id = 1").toExpr("id <> 1").run() 

I still get only the one hop neighbors. Am I missing something? I also tested other operators that stand for "not equal" without success. A wild guess: Maybe when BFS is reaching again the source vertex (it should look at it, but not visit its neighbors), it does not match the "toExpr" expression and aborts.

Another question: GraphFrames is directed, isn't? In order to get an "undirect graph", I should add reciprocal edges, shouldn't I?

Daniel
  • 127
  • 1
  • 9
  • Daniel, can you help me understand this statement `toExpr("name <> 'Esther'")` , I am not a scala user but I use graphframes in python. I understand your fromexpression – Hardik Gupta Dec 08 '16 at 04:23
  • It's SQL different signal. I also tested with '!=" and 'NOT LIKE' instead of '<>'. – Daniel Dec 08 '16 at 17:18

1 Answers1

0

Upon reaching Fanny and David, you've found the shortest path from Esther to a non-Esther node, so the search stops.

According to the GraphFrames User Guide, the bfs method "finds the shortest path(s) from one vertex (or a set of vertices) to another vertex (or a set of vertices). The beginning and end vertices are specified as Spark DataFrame expressions."

In the graph you're using, the shortest path from Esther to a non-Esther node is just one hop, so the breadth-first search stops there.

Consider your numeric toy graph. You're finding this (one hop):

import org.graphframes.GraphFrame

val edgesDf = spark.sqlContext.createDataFrame(Seq(
  (1, 2),
  (2, 3), 
  (3, 4),
  (4, 5)    
)).toDF("src", "dst")

val g = GraphFrame.fromEdges(edgesDf)
g.bfs.fromExpr("id = 1").toExpr("id <> 1").run().show()

+----+-----+---+
|from|   e0| to|
+----+-----+---+
| [1]|[1,2]|[2]|
+----+-----+---+

Suppose you queried it like this instead:

g.bfs.fromExpr("id = 1").toExpr("id > 3").run().show()

+----+-----+---+-----+---+-----+---+
|from|   e0| v1|   e1| v2|   e2| to|
+----+-----+---+-----+---+-----+---+
| [1]|[1,2]|[2]|[2,3]|[3]|[3,4]|[4]|
+----+-----+---+-----+---+-----+---+

Now the bfs method takes three hops. This is the shortest path from 1 to a node that is greater than 3. Even though there's an edge from 4 to 5 (and 5 > 3), it doesn't continue because that would be a longer path (four hops).

Another question: GraphFrames is directed, isn't? In order to get an "undirect graph", I should add reciprocal edges, shouldn't I?

I think it depends on the algorithm you want to apply to the graph. Someone could write an algorithm that ignores the direction in the underlying edges DataFrame. But if an algorithm assumes a directed graph, then I think you're right: you'd have to add reciprocal edges.

You may get a better response (from someone else) if you ask this as a separate question.

matw
  • 71
  • 5