I have a large list in JavaPairRDD<Integer, List<String>>
and I want to do a flatMap to get all possible combinations of list entries so that I end up with JavaPairRDD<Integer, Tuple2<String,String>>
. Basically if i have something like
(1, ["A", "B", "C"])
I want to get:
(1, <"A","B">)
(1, <"A", "C">)
(1, <"B", "C")
The problem is with large lists as what I have done is created a large list of Tuple2 objects by having a nested loop over the input list. Sometimes this list does not fit in memory. I found this, but not sure how to implement it in Java: Spark FlatMap function for huge lists