6

I want to print each line from a huge textfile (more than 600 000 MB).

But when I try the code below I get "...OutOfMemoryError: Java heap space" right before reaching line number 1 000 000.

Is there a better way to handle the input rather than FileReader and LineNumberReader?

FileReader fReader = new FileReader(new File("C:/huge_file.txt"));
LineNumberReader lnReader = new LineNumberReader(fReader);
String line = "";
while ((line = lnReader.readLine()) != null) {
    System.out.println(lnReader.getLineNumber() + ": " + line);
}
fReader.close();
lnReader.close();

Thanks in advance!


Thanks all for your answers!

I finally found the memory leak, an unused java class instance which duplicated it self for each row iteration. In other words, it had nothing to do with the file loading part.

Nick Larsen
  • 18,631
  • 6
  • 67
  • 96
carloscloud
  • 351
  • 1
  • 3
  • 14
  • 5
    The code you have will not produce an OutOfMemoryError, what you are doing with the data is far more likely to be the cause of the problem. – Peter Lawrey May 03 '11 at 11:10
  • 5
    With a text-file that huge, is it possible that you just have a very large line in there? – Björn Pollex May 03 '11 at 11:11
  • 2
    Also managing a 600 GB text file is going to be slow and cumbersome. You should consider using smaller text files. – Peter Lawrey May 03 '11 at 11:11
  • 4
    As SpaceCowboy suggests, you need about 5x the longest line in free memory. Try `System.out.print(lnReader.getLineNumber() + ": "); System.out.println(line);` as it avoid creating a second StringBuilder/String with the whole line and more in it. – Peter Lawrey May 03 '11 at 11:13
  • @Peter Lawrey: SpaceCowboy means? – ivorykoder May 03 '11 at 11:14
  • Maybe I need to "reset" the FileReader or LineNumberReader to release memory once in a while? Or, is that sort of function already implemented to handle the memory automatically? – carloscloud May 03 '11 at 11:35
  • 1
    @heykarlm, FileReader doesn't use much memory and LineNumberReader uses a constant amount of memory (~ 16KB even if unused) The only variable amount of memory used is the line read and the string you build for the output. – Peter Lawrey May 03 '11 at 11:48
  • 1
    I did some more debugging and the exception is more likely thrown in another part of my code (not mentioned above). – carloscloud May 03 '11 at 11:48
  • 1
    Finally I think I found the failure. I forgot to clear an unused array in the end of every printing. So if someone runs into a similar problem, 1. monitor your memory in some sort of task manager 2. keep a really close eye on all your arrays – carloscloud May 05 '11 at 13:00
  • If someone runs into a similar problem, and wants to post it on SO, post code which really has the problem, not an untested cut-down version. – Robin Green May 13 '11 at 10:31
  • 1
    I had to deal with batch importing of huge text files in a previous project. I eventually separated that part out in to a perl script - it not only used less memory, but also ran ~60 times faster (4 mins instead of 4 hrs). You might want to consider that if performance becomes an issue. – Alan Escreet May 16 '11 at 09:38

3 Answers3

1

LineNumberReader extends BufferedReader. It may be that the buffered reader is buffering too much. Running the program through a profiler should prove this without a doubt.

One of the constructors of the BufferedReader takes a buffer size, this constructor is also available in the line number reader.

replace:

LineNumberReader lnReader = new LineNumberReader(fReader);

with:

LineNumberReader lnReader = new LineNumberReader(fReader, 4096);
0

Maybe you should try setting the max heap size for Java virtual machine? Or check this link:

http://www.techrepublic.com/article/handling-large-data-files-efficiently-with-java/1046714

Peter O.
  • 32,158
  • 14
  • 82
  • 96
rav
  • 166
  • 1
  • 2
0

use this class to read the file: RandomAccessFile then you won't have the out of memory problem anymore