If you're on Linux and you can run the CSV through a script first, then you can use "split":
$ split -l 100000 big.csv small-
This generates files named small-aa, small-ab, small-ac... To rename these to csv's if needed:
$ for a in small-*; do
mv $a $a.csv; # rename split files to .csv
java MyCSVProcessor $a.csv; # or just process them anyways
done
Try this for additional options:
$ split -h
-a –suffix-length=N use suffixes of length N (default 2)
-b –bytes=SIZE put SIZE bytes per output file
-C –line-bytes=SIZE put at most SIZE bytes of lines per output file
-d –numeric-suffixes use numeric suffixes instead of alphabetic
-l –lines=NUMBER put NUMBER lines per output file
This is however a poor mitigation for your problem - the reason your CSV reader module is running out of memory, is because it's either reading the whole file into memory before splitting it, or it's doing that and keeping your processed output in memory. To make your code more portable and universally runnable, you should consider processing one line at a time - and splitting the input yourself, line by line. (From https://stackabuse.com/reading-and-writing-csvs-in-java/)
BufferedReader csvReader = new BufferedReader(new FileReader(pathToCsv));
while ((row = csvReader.readLine()) != null) {
String[] data = row.split(",");
// do something with the data
}
csvReader.close();
Caveat with the above code is that quoted commas will just be treated as new columns - you will have to add some additional processing if your CSV data contains quoted commas.
Of course, if you really want to use your existing code, and just want to split the file, you can adapt the above:
import java.io.*;
public class split {
static String CSVFile="test.csv";
static String row;
static BufferedReader csvReader;
static PrintWriter csvWriter;
public static void main(String[] args) throws IOException {
csvReader = new BufferedReader(new FileReader(CSVFile));
int line = 0;
while ((row = csvReader.readLine()) != null) {
if (line % 100000 == 0) { // maximum lines per file
if (line>0) { csvWriter.close(); }
csvWriter = new PrintWriter("cut-"+Integer.toString(line)+CSVFile);
}
csvWriter.println(row);
// String[] data = row.split(",");
// do something with the data
line++;
}
csvWriter.close();
csvReader.close();
}
}
I chose PrintWriter above FileWriter or BufferedWriter because it automatically prints the relevent newlines - and I would presume that it's buffered... I've not written anything in Java in 20 years, so I bet you can improve on the above.