I have a list (in Scala).
val seqRDD = sc.parallelize(Seq(("a","b"),("b","c"),("c","a"),("d","b"),("e","c"),("f","b"),("g","a"),("h","g"),("i","e"),("j","m"),("k","b"),("l","m"),("m","j")))
I group by the second element for a particular statistics and flatten the result into one list.
val checkItOut = seqRDD.groupBy(each => (each._2))
.map(each => each._2.toList)
.collect
.flatten
.toList
The output looks like this:
checkItOut: List[(String, String)] = List((c,a), (g,a), (a,b), (d,b), (f,b), (k,b), (m,j), (b,c), (e,c), (i,e), (j,m), (l,m), (h,g))
Now, what I'm trying to do is "group" all elements (not pairs) that are connected to other elements in any pair to one list. For example: c is with a in one pair, a is with g in its next, so (a,c,g) are connected. Then, c is also with b and e, that b is with a, d, f, k and these are with other characters in some other pair. I want to have them in a list.
I know this can be done with a BFS traversal. BUt wondering if there was an API in Spark that does this?