I am working with Apache Spark for a query processing engine. The problem I recently faced is that I want to limit the number of elements in the rdd.
I know there is take function in rdd which can be used to retrieve only the given number of elements from rdd. However, after applying this function the result won't be rdd.
It is essential that even if we apply this functionality, the rdd has to be remain (as transformation)
So for now what I did is following
public JavaRDD<Map<String,Object>> limitRDD(JavaRDD<Map<String,Object>> rdd, JavaSparkContext context, int number){
context.parallelize(rdd.take(number));
return rdd;
I think this is massive waste of time. However I can't think of any way using transformations such as map, filter to implement this functionality.
Is there anyway to achieve this without what I did here?
Thanks