0

I'm having some problem to understand BFS on Graphframe. I´m trying to get the "father of all" - the one that has no parent in the graph.

See, I have this Dataframe:

val df = sqlContext.createDataFrame(List(
  ("153030152492012801800", ""),
  ("153030152492012801845", ""),
  ("153030152492013801220","153030152492012801845"),
  ("153030152492013800151","153030152492012801845"),
  ("153030152492014800546","153030152492012801845"),
  ("153030152492013800497", "153030152492013800151"),
  ("153030152492013801860", "153030152492013800151"),
  ("153030152492014800038", "153030152492013801860"),
  ("153030152492014801015", "153030152492014800038"),
  ("153030152492014801235", "153030152492014801015")
  )).toDF("id", "parent")

And I build my Graph as:

val df_vertices = df.selectExpr("id", "parent")
val df_edges = df.withColumnRenamed("id", "src").withColumnRenamed("parent", "dst").withColumn("relation", lit("parent"))
val g  = GraphFrame(df_vertices, df_edges)

Finally, I run BFS as:

g.bfs.fromExpr("id = 153030152492013800151").toExpr("parent = ''").run().show(false)

But I'm getting this:

+----------------------------------------------+------------------------------------------------------+-------------------------+
|from                                          |e0                                                    |to                       |
+----------------------------------------------+------------------------------------------------------+-------------------------+
|[153030152492013801220, 153030152492012801845]|[153030152492013801220, 153030152492012801845, parent]|[153030152492012801845, ]|
|[153030152492013800151, 153030152492012801845]|[153030152492013800151, 153030152492012801845, parent]|[153030152492012801845, ]|
+----------------------------------------------+------------------------------------------------------+-------------------------+

The question is, why am I getting the first line, the one that starts with "153030152492013801220" ? See, this value is not in the path of "153030152492013800151". So, why this result?

Thank you all!

1 Answers1

0

I figured out my error. The problem is that "ID" is a string, so, I had to call BFS as:

g.bfs.fromExpr("id = '153030152492013800151'").toExpr("parent = ''").run().show(false)

I posted the answer because graphframe doesn't show any error. Just to warn you about that.