Given an RDD, what's the best way to sort it and then consume it in discrete sized chunks? For example:
JavaRDD<Integer> baseRdd = sc.parallelize(Arrays.asList(1,2,5,3,4));
JavaRDD<Integer> sorted = baseRdd.sortBy(x -> x, true, 5);
// returns 1, 2
List<Integer> first = sorted.take(2);
// returns 1, 2. How to skip 2 and then take?
List<Integer> second = sorted.take(2);
What I would really like is to consume 1, 2
on the first call to take(2)
, and then have some sort of "skip" parameter that gets passed into the second take(2)
to return 3, 4
?
Since that "skip" function doesn't seem to exist in the current RDD functionality, what would be the most efficient way to split up the sorted RDD into chunks of known size that can be independently acted on?