0

Suppose I have a very large sequence file, but I only want to work with first 1000 rows locally. How can I do that?

Currently my code looks like this

JavaPairRDD<IntWritable,VectorWritable> seqVectors = sc.sequenceFile(inputPath, IntWritable.class, VectorWritable.class);
user3086871
  • 671
  • 3
  • 7
  • 25

1 Answers1

1

what you should do is parallelize of these array:

JavaPairRDD<IntWritable,VectorWritable> RDDwith1000 = sc.parallelize(seqVectors.take(1000));

see simple example here and below:
enter image description here

Ronak Patel
  • 3,819
  • 1
  • 16
  • 29