Calculate multiple step connections in GraphX on Spark

Question

I have been looking for the GraphX on Spark documentation and I am trying to work out how to calculate all the 2 and potentially further step connections in the graph.

If I have the following structure

A -> b
b -> C
b -> D

Then A is connected to C and D via B (A -> b -> C) and (A -> b -> D)

I was having a look at the connected components functions but not sure how you would extend it to this. In reality b will be a different vertex type but not sure if this has an effect or not.

Any suggestions would be greatly appreciated I am pretty new to GraphX

It is pretty trivial with `Graphframes`. See for example: http://stackoverflow.com/q/37417469/1560062 — zero323, Jun 01 '16 at 17:47
Thanks for the response I will definitely have a look at using graph frames. Are you aware if I can do this using what is natively included in graphx instead of graph frames? — SChorlton, Jun 01 '16 at 19:07

score 0 · Answer 1 · answered Jun 28 '16 at 15:16

It seems you just need to use collectNeighborIds action, and then join with reversed copy of itself. I wrote some code:

val graph : Graph[Int, Int] = ...
val bros = graph.collectNeighborIds(EdgeDirection.Out)
val flat = bros.flatMap(x => x._2.map(y => (y, x._1)))
val brosofbros : RDD[(VertexId, Array[VertexId])]= flat.join(bros)
.map(x => (x._2._1, x._2._2))
.reduceByKey(_ ++ _)

Finally 'brosofbros' contains vertex id and all its second neighbors, in you example it would be [A, Array[C, D]]. (but there is not B vertex)

Calculate multiple step connections in GraphX on Spark

1 Answers1