I'm new to Spark (although I've Hadoop and MapReduce experience) and am trying to process a giant file with a JSON record per line. I'd like to do some transformation on each line and write an output file every n records (say, 1 million). So if there are 7.5 million records in the input file, 8 output files should be generated. How can I do this? You may provide your answer in either Java or Scala.
Using Spark v2.1.0.