If you have an RDD with Tuples, however the Tuples are represented, you can use mapToPair
to transform your RDD of Tuple into a PairRDD with Key and Value as preferred.
In Java 8 this could be
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->new Tuple2(
extractKey(t),
extractTuples(t)
));
Note that this operation will introduce a shuffle.
To state the obvious, extractKey
and extractTuples
are to be methods to be implemented extracting the parts of the original tuple as needed.
With my limited knowledge of Scala Tuples, And assuming the input is something like scala.Tuple5<String,Integer,Integer,Integer,Integer>
, this could be:
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->new Tuple2(
t._1,
Arrays.asList(t._2,t._3,t._4,t._6)
));
If however, you do not know beforehand the arity (number of elements) of your Tuple
, then in scala terms, it is a Product
. To access your elements dynamically, you will need to use the Product
interface, with a choice of:
int productArity()
Object productElement(int n)
Iterator<Object> productIterator()
Then it becomes a regular Java exercise:
JavaPairRDD<Integer,List<String>> r =
rddOfTuples.mapToPair((t)->{
List<String> l = new ArrayList<>(t.productArity()-1);
for (int i = 1; i < t.productArity(); i++) {
l.set(i-1,t.productElement(i));
}
return new Tuple2<>(t._1,l);
}));
I hope I have it all right ... this code above is untested/uncompiled ... So if you can get it to work with corrections, then feel free to apply the corrections in this answer ...