I have an RDD tagIDs with the following content:
(name_1, Set_1),
...
(name_t, Set_t),
...
I want to create pairs of pairs (name_i, Set_i), (name_j, Set_j), but only if Set_i.intersect(Set_j).size > 0
The only way I managed to do that is:
val withInd = tagIds.zipWithIndex()
val tagIdsZipped = withInd.cartesian(withInd)
.filter{
case(a, b) => (a._2 < b._2 && a._1._2.intersect(b._1._2).size>0)
}
.map{
case(a, b) => (a._1, b._1)
}
I wonder if there are any more efficient ways for that, because, I think, cartesian produce too much of pairs which will be removed in filter, and I want to avoid it.
Thank you in advance!