1

I have a huge file stored in S3 and loading ii into my Spark Cluster and i want to invoke a custom Java Library which takes a Input File Location, process the Data and writes to a given output location. How ever i cannot rewrite that custom logic in Spark.

I am trying to see whether i can load the file from S3 and save the partition to local disk and give that location to Custom Java App and once it is processed load all the partitions and save it into S3.

Is this possible ? What ever i have read so far it looks like i need to use RDD Api. but couldn't find more info on how i can save each partition to local disk.

Appreciate any inputs.

Sateesh K
  • 1,071
  • 3
  • 19
  • 45
  • Never mind. I am able to resolve the issue using different approach.I wrote a simple map reduce job and running it on EMR and it is much simpler than trying to achieve the same thing using Spark. – Sateesh K Mar 25 '18 at 02:48

0 Answers0