0

I am working with Apache Spark for a query processing engine. The problem I recently faced is that I want to limit the number of elements in the rdd.

I know there is take function in rdd which can be used to retrieve only the given number of elements from rdd. However, after applying this function the result won't be rdd.

It is essential that even if we apply this functionality, the rdd has to be remain (as transformation)

So for now what I did is following

public JavaRDD<Map<String,Object>> limitRDD(JavaRDD<Map<String,Object>> rdd, JavaSparkContext context, int number){
context.parallelize(rdd.take(number));
return rdd;

I think this is massive waste of time. However I can't think of any way using transformations such as map, filter to implement this functionality.

Is there anyway to achieve this without what I did here?

Thanks

1 Answers1

0

I think sample might be the function you want.

Ton Torres
  • 1,509
  • 13
  • 24
  • It's not exactly what I had in mind, but definitely sample would also work. However, for this case, I needed constant result for same input – Hyun Joon Kim Dec 09 '15 at 07:37