1

Here is the code that counts the number of lines in a file. It works with BufferedReader and is fine. No problem . In total there are over 25,000,000 rows

  BufferedReader br = new BufferedReader(new FileReader("C:\\...test.csv")); 
            int lineNbr = 0; 
            while(br.readLine() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 
            } 
  br.close(); 
  System.exit(0); 

Here is a similar code with SuperCSV . It throws out of memory after line 11,000,000

 CsvListReader reader = new CsvListReader(new FileReader("C:\\... test.csv"), CsvPreference.EXCEL_PREFERENCE ); 

             List<String> row = reader.read();
            row = reader.read();
                lineNbr = 0;   
            while (reader.read() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 


            }

            reader.close(); 
            System.exit(0); 

What am i doing wrong? How to correctly read a file with SuperCSV ?

john
  • 647
  • 5
  • 23
  • 53
  • Don't know the framework but if the `CsvListReader` parses all data eagerly you'd have your heap memory blow up pretty fast... – Mena Aug 21 '17 at 14:36
  • Yes it does. But should't the garbage collector take care of this? I don't use the results of the `parse` – john Aug 21 '17 at 14:38
  • you cannot really predict when garbage collection takes place. It might be that the parsed data is not de-referenced quickly enough or at all. – Mena Aug 21 '17 at 14:39

2 Answers2

4

Based on your sample code and quick review of the SuperCSV code, I don't see any reason for an OutOfMemory exception to be thrown. I suspect you did not post all information in your sample, or something else is at play.

You can review the source code for SuperCSV here:

I do not see any state being stored that would cause referenced heap memory to grow in a way that could not be garbage collected.

Another possibility is that your CSV file is corrupt, perhaps missing line breaks at some point. The library makes a readLine call at at least one location.

kaliatech
  • 17,579
  • 5
  • 72
  • 84
3

The major difference: your first example simply reads a row from a file, and discards that.

Your second example not only reads a string - keep in mind that call to read() returns a List<String>! Meaning: the CSV reader library is probably doing its job: it is parsing all your input data. That simply requires much more resources than just reading lines and throwing them away.

So, most likely, the second example creates garbage on such a high rate that the garbage collector isn't deal with it.

GhostCat
  • 137,827
  • 25
  • 176
  • 248