4

I just read a file whose size is 167MB and line number is 1884000. The method I use is BufferedReader to get the effect of reading it in line.

What I noticed is that the process of reading the file is growing slower and slower as the current line number increased (In this case, it tooks me 3h30min to finish it).

I know using nio may speed up this procedure, but I want to read the file in line.

My code is as below; could anyone give me some suggestions? Thanks a lot!

String htmlContentPath = html.getAbsolutePath();
BufferedReader reader = new  BufferedReader(new InputStreamReader(new FileInputStream(htmlContentPath)));
String line = null;
int cnt = 0;
while((line = reader.readLine()) != null)       {
    this.proc(line);
    if((cnt++ % 2000) == 0) {
        logger.info("current line number:\t"+cnt);
    }
}
Michael Myers
  • 188,989
  • 46
  • 291
  • 292
Judking
  • 6,111
  • 11
  • 55
  • 84
  • 2
    What does `proc(line)` do? Could that be slowing it down the longer it runs? – Michael Myers Aug 08 '13 at 15:42
  • Is the `this.proc(line)` call necessary on every iteration? – kevmo314 Aug 08 '13 at 15:42
  • 1. Use multiple threads 2. Sync the threads properly 3. Use Java NIO `Channels` :) – An SO User Aug 08 '13 at 15:48
  • Is there a reason you're not using FileReader? – Steve Kuo Aug 08 '13 at 15:53
  • 2
    Even reading slowly I get 100 MB/s http://vanillajava.blogspot.co.uk/2011/01/how-slow-can-you-readwrite-files-in.html I suspect it is not the readng but what you do with the text which is slow. – Peter Lawrey Aug 08 '13 at 16:11
  • `slower and slower as the current line number increased` This is a sure sign that it what you are doing with the input which is getting slower and slower. I suggest you comment out the `// this.proc(line);` and run it again. – Peter Lawrey Aug 08 '13 at 16:14
  • @SteveKuo `FileReader` is exactly equivalent to an `InputStreamReader` reading from a `FileInputStream`. But it's good to get into the habit of making the `InputStreamReader` explicit because that allows you to control the character encoding - `FileReader` always uses the system default. – Ian Roberts Aug 08 '13 at 16:25
  • @LittleChild Multiple threads won't accomplish anything. There is only one file, on one filesystem, on one disk probably, and none of those things are multi-threaded. There's no reason to believe a `Channel` will be any quicker. – user207421 Aug 21 '13 at 01:04

4 Answers4

2

You should be able to find a answer here:

http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly

For the best Java read performance, there are four things to remember:

  • Minimize I/O operations by reading an array at a time, not a byte at a time. An 8Kbyte array is a good size.

  • Minimize method calls by getting data an array at a time, not a byte at a time. Use array indexing to get at bytes in the array.

  • Minimize thread synchronization locks if you don't need thread safety. Either make fewer method calls to a thread-safe class, or use a non-thread-safe class like FileChannel and MappedByteBuffer.

  • Minimize data copying between the JVM/OS, internal buffers, and application arrays. Use FileChannel with memory mapping, or a direct or wrapped array ByteBuffer.

Khinsu
  • 1,487
  • 11
  • 27
0

This can be caused by a swap, depending on the memory footprint of your file in the proc method, you can perform a visualVM on you process to see the heap status, and then tune up(xms, xmx)/reduce memory consumption of your method.

Cheers.

0

When I first read your question I was going to suggest that you comment out the call to proc() and then use some of the other answers to speed the reading of the file (which should be the entire execution time because you commented out the processing call).

On further thought, I'll suggest you use a profiler (without any lines commented out) If you're using Eclipse there are several JVM profilers on Eclipse Marketplace and I'm sure there are profiles integrated into other IDE's as well. The profilers can show you the hotspots in your code - the places where you seem to be most of the time. That information, plus your knowledge of the program logic, will give rise to ways to speed up the worst bottlenecks.

This is an iterative process with better and better results.

I also recommend that for your testing you use a much smaller sample file at first.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
  • @Ivan. I'm sure. I only know Eclipse, but profiling is such a key feature that you'll find it on any IDE. No slight to NetBeans or IntelliJ intended. – Chris Gerken Aug 08 '13 at 19:49
0

This sounds like a memory issue to me (slow downs often occur as the need for garbage collection increases due to a dearth of memory).

The code you posted doesn't look like it should be slowing down as the line number increases (assuming the proc() call is "clean").

I 2nd Chris G's advise to remove the proc() call to see if the slow down still occurs when you are just reading the fine and not processing any of its lines.

I would also add that you can try using the -Xmx and -Xms flags to give the JVM access to more memory at the outset.

Here is a question that may be relevant: Java threads slow down towards the end of processing

Community
  • 1
  • 1
Ivan
  • 1,256
  • 1
  • 9
  • 16