I'm having some problem to understand BFS on Graphframe. I´m trying to get the "father of all" - the one that has no parent in the graph.
See, I have this Dataframe:
val df = sqlContext.createDataFrame(List(
("153030152492012801800", ""),
("153030152492012801845", ""),
("153030152492013801220","153030152492012801845"),
("153030152492013800151","153030152492012801845"),
("153030152492014800546","153030152492012801845"),
("153030152492013800497", "153030152492013800151"),
("153030152492013801860", "153030152492013800151"),
("153030152492014800038", "153030152492013801860"),
("153030152492014801015", "153030152492014800038"),
("153030152492014801235", "153030152492014801015")
)).toDF("id", "parent")
And I build my Graph as:
val df_vertices = df.selectExpr("id", "parent")
val df_edges = df.withColumnRenamed("id", "src").withColumnRenamed("parent", "dst").withColumn("relation", lit("parent"))
val g = GraphFrame(df_vertices, df_edges)
Finally, I run BFS as:
g.bfs.fromExpr("id = 153030152492013800151").toExpr("parent = ''").run().show(false)
But I'm getting this:
+----------------------------------------------+------------------------------------------------------+-------------------------+
|from |e0 |to |
+----------------------------------------------+------------------------------------------------------+-------------------------+
|[153030152492013801220, 153030152492012801845]|[153030152492013801220, 153030152492012801845, parent]|[153030152492012801845, ]|
|[153030152492013800151, 153030152492012801845]|[153030152492013800151, 153030152492012801845, parent]|[153030152492012801845, ]|
+----------------------------------------------+------------------------------------------------------+-------------------------+
The question is, why am I getting the first line, the one that starts with "153030152492013801220" ? See, this value is not in the path of "153030152492013800151". So, why this result?
Thank you all!