Out of memory with superCSV java library

Question

Here is the code that counts the number of lines in a file. It works with BufferedReader and is fine. No problem . In total there are over 25,000,000 rows

  BufferedReader br = new BufferedReader(new FileReader("C:\\...test.csv")); 
            int lineNbr = 0; 
            while(br.readLine() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 
            } 
  br.close(); 
  System.exit(0);

Here is a similar code with SuperCSV . It throws out of memory after line 11,000,000

 CsvListReader reader = new CsvListReader(new FileReader("C:\\... test.csv"), CsvPreference.EXCEL_PREFERENCE ); 

             List<String> row = reader.read();
            row = reader.read();
                lineNbr = 0;   
            while (reader.read() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 


            }

            reader.close(); 
            System.exit(0);

What am i doing wrong? How to correctly read a file with SuperCSV ?

Don't know the framework but if the `CsvListReader` parses all data eagerly you'd have your heap memory blow up pretty fast... — Mena, Aug 21 '17 at 14:36
Yes it does. But should't the garbage collector take care of this? I don't use the results of the `parse` — john, Aug 21 '17 at 14:38
you cannot really predict when garbage collection takes place. It might be that the parsed data is not de-referenced quickly enough or at all. — Mena, Aug 21 '17 at 14:39

kaliatech · Accepted Answer · 2017-08-21T19:46:01.690

Based on your sample code and quick review of the SuperCSV code, I don't see any reason for an OutOfMemory exception to be thrown. I suspect you did not post all information in your sample, or something else is at play.

You can review the source code for SuperCSV here:

https://super-csv.github.io/super-csv/xref/index.html

I do not see any state being stored that would cause referenced heap memory to grow in a way that could not be garbage collected.

Another possibility is that your CSV file is corrupt, perhaps missing line breaks at some point. The library makes a readLine call at at least one location.

score 3 · Answer 2 · answered Aug 21 '17 at 14:46

The major difference: your first example simply reads a row from a file, and discards that.

Your second example not only reads a string - keep in mind that call to read() returns a List<String>! Meaning: the CSV reader library is probably doing its job: it is parsing all your input data. That simply requires much more resources than just reading lines and throwing them away.

So, most likely, the second example creates garbage on such a high rate that the garbage collector isn't deal with it.

Out of memory with superCSV java library

2 Answers2