I recently needed to sort a one line file (integers separated by ",") into smaller chunks with memory restriction and efficiency in mind. I'm currently following this logic:
File file = new File("bigfile.txt");
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
int BUFFER_SIZE = 10; // can and should be bigger
byte[] bytes = new byte[BUFFER_SIZE];
while ((bis.read(bytes)) != -1) {
// convert bytes to string
// split bytes to String[]
// save the last number if was cut in the middle and save it for the next round of reading and remove it from the current String[]
// fix cut number if necessary and put it in the String[]
// sort the String[]
// write the String[] into a file
// call Garbage collector to prevent memory leak?
}
bis.close();
Assuming I'm restricted to 5MB of memory and have to read a one-line file with 10,000,000 integers separated by ",":
- If I use a very small buffer size (ex. 10) to read the file then I'll create thousands of files.
- If I use a decent but still small buffer size (ex. 100KB) then I will still get a lot of files.
- If I use a bigger buffer size (ex. 4MB) then I will have heap problems when sorting and splitting the result in memory due to the restriction.
What is the best approach for me to take to get the least amount of sorted files (or the biggest chunks of data per file possible)?