0

I have an RDD tagIDs with the following content:

(name_1, Set_1),
...
(name_t, Set_t),
...

I want to create pairs of pairs (name_i, Set_i), (name_j, Set_j), but only if Set_i.intersect(Set_j).size > 0

The only way I managed to do that is:

  val withInd = tagIds.zipWithIndex()
  val tagIdsZipped = withInd.cartesian(withInd)
    .filter{
        case(a, b) => (a._2 < b._2 && a._1._2.intersect(b._1._2).size>0)
    }
    .map{
        case(a, b) => (a._1, b._1)
    }

I wonder if there are any more efficient ways for that, because, I think, cartesian produce too much of pairs which will be removed in filter, and I want to avoid it.

Thank you in advance!

elfinorr
  • 189
  • 3
  • 12

0 Answers0