0

In my graph I need to detect vertices that do not have inbound relations. Using the example below, "a" is the only node that is not being related by the anyone.

a -->  b
b -->  c
c -->  d
c -->  b

I would really appreciate any examples to detect "a" type nodes in my graph.

Thanks

webber
  • 1,834
  • 5
  • 24
  • 56
  • 1
    So basically you are asking for vertices with [in-degree](https://graphframes.github.io/api/scala/index.html#org.graphframes.GraphFrame@inDegrees:org.apache.spark.sql.DataFrame) equal to 0, right? – zero323 Nov 10 '18 at 22:32
  • after looking through the link to the docs, it appears inDegrees should be 0, right? I really hope its that easy – webber Nov 10 '18 at 22:35
  • If my understanding of your description is correct the answer is positive. – zero323 Nov 11 '18 at 19:13

1 Answers1

0

unfortunately the approach is not as simple because the graph.degress, graph.inDegrees, graph.outDegrees functions are not returning vertices with 0 edges. (see documentation for Scala which holds true for Python too https://graphframes.github.io/graphframes/docs/_site/api/scala/index.html#org.graphframes.GraphFrame)

so the following code will always return a empty dataframe

g=Graph(vertices,edges)

# check for start points 
g.inDegrees.filter("inDegree==0").show()
+---+--------+
| id|inDegree|
+---+--------+
+---+--------+

# or check for end points 
g.outDegrees.filter("outDegree==0").show()
+---+---------+
| id|outDegree|
+---+---------+
+---+---------+

# or check for any vertices that are alone without edge
g.degrees.filter("degree==0").show()
+---+------+
| id|degree|
+---+------+
+---+------+

what works is a left, right or full join of the inDegree and outDegree result and filter on the NULL values of the respective column

the join will provide you a merged columns with NULL values on the start and end positions

g.inDegrees.join(g.outDegrees,on="id",how="full").show()

+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| b6|       1|     null|
| a3|       1|        1|
| a4|       1|     null|
| c7|       1|        1|
| b2|       1|        2|
| c9|       3|        1|
| c5|       1|        1|
| c1|    null|        1|
| c6|       1|        1|
| a2|       1|        1|
| b3|       1|        1|
| b1|    null|        1|
| c8|       3|     null|
| a1|    null|        1|
| c4|       1|        4|
| c3|       1|        1|
| b4|       1|        1|
| c2|       1|        3|
|c10|       1|     null|
| b5|       2|        1|
+---+--------+---------+

now you can filter on what search

my_in_Degrees=g.inDegrees
my_out_Degrees=g.outDegrees

# get starting vertices (no more childs)
my_in_Degrees.join(my_out_Degrees,on="id",how="full").filter(my_in_Degrees.inDegree.isNull()).show()
+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| c1|    null|        1|
| b1|    null|        1|
| a1|    null|        1|
+---+--------+---------+


# get ending vertices (no more parents)
my_in_Degrees.join(my_out_Degrees,on="id",how="full").filter(my_out_Degrees.outDegree.isNull()).show()
+---+--------+---------+
| id|inDegree|outDegree|
+---+--------+---------+
| b6|       1|     null|
| a4|       1|     null|
|c10|       1|     null|
+---+--------+---------+

Alex Ortner
  • 1,097
  • 8
  • 24