I am new with Apache Spark, can i get a snippet of how to implement 'flattening' for dependency graph? i.e lets say I have: nodes :A,B,C edges : (A,B),(B,C)
it would result with a new Graph: nodes:A,B,C edges:(A,B)(A,C)(B,C)
I am new with Apache Spark, can i get a snippet of how to implement 'flattening' for dependency graph? i.e lets say I have: nodes :A,B,C edges : (A,B),(B,C)
it would result with a new Graph: nodes:A,B,C edges:(A,B)(A,C)(B,C)
1) Presuming each node is in its own row
A
B
C
2) Do a CROSS JOIN with self as first step.
A A
A B
A C
B A
B B
B C
C A
C B
C C
2) In second step filter out all the rows where Node name is repeated.
A B
A C
B A
B C
C A
C B
3) Post that derive another field from two fields that would tell you the edge.
A B AB
A C AC
B A BA
B C BC
C A CA
C B CB
You would need to convert this into the (Scala/Python) syntax though. Hope this helps.