0

I am trying to read a large file (>150MB) and return the file content as a ByteArrayOutputStream. This is my code...

private ByteArrayOutputStream readfileContent(String url) throws IOException{

    log.info("Entering readfileContent ");
    ByteArrayOutputStream writer=null;
    FileInputStream reader=null;

    try{
        reader = new FileInputStream(url);
        writer = new ByteArrayOutputStream();

        byte[] buffer = new byte[1024];

        int bytesRead = reader.read(buffer);
        while (bytesRead =  > -1) { 
            writer.write(buffer, 0, bytesRead);
            buffer = new byte[1024];
        }

    }
    finally {
        writer.close();
    }

    log.info("Exiting readfileContent ");
    return writer;
}

I am getting an java.lang.OutOfMemoryError: Java heap space exception. I have tried increasing the java heap size, but it still happens. Could someone please assist with this problem.

wattostudios
  • 8,666
  • 13
  • 43
  • 57
Anshu Kunal
  • 196
  • 1
  • 13
  • 1
    Don't do that. The file is too large to be read into memory all at once. Why do you think you need a ByteArrayOutputStream? What will the caller do with this stream? Why not just return a FileInputStream and let the caller read from it? – Cheeso May 10 '12 at 12:01
  • You might want to read file contents in chunks – Rakesh May 10 '12 at 12:02
  • Side notes: 1. `in != null` is redundant -- `in` is never, ever going to become `null` all of a sudden. 2. What you are doing with the `length` var is kind of perverse. – Marko Topolnik May 10 '12 at 12:05
  • @Cheeso this bytestream will be input to FAST ESP – Anshu Kunal May 10 '12 at 13:31
  • @Rakesh how to read in chunks – Anshu Kunal May 10 '12 at 13:32
  • I don't know FAST ESP, but it seems to me, for handling large content, you'll want to provide a readable stream. There is no feasible way to handle 150mb blobs besides streaming. – Cheeso May 10 '12 at 16:14
  • @AnshuKunal check [this](http://stackoverflow.com/questions/5510979/java-read-text-file-by-chunks) and [this](http://stackoverflow.com/questions/9588348/java-read-file-by-chunks) – Rakesh May 13 '12 at 15:35

5 Answers5

1

You should return the BufferedInputStream and let the caller read from it. What you are doing is copying the whole file into memory as a ByteArrayOutputStream.

Your question is missing what you want to do with the file content. Without that we can only guessing. There is a ServletOutputStream commented out. Did you want to write to this originally? Writing to this instead to the ByteArrayOutputStream should be working.

Kai
  • 38,985
  • 14
  • 88
  • 103
1

There is an error in the while loop. Change it to

 while (bytesRead >= -1) { 
     writer.write(buffer, 0, bytesRead);
     bytesRead = reader.read(buffer);
 }

Also don't forget to close reader.

(It will still need quite large amount of memory.)

x22
  • 541
  • 2
  • 7
  • It's not an 'also' / alternative, your original code was an infinite loop that would write the first 1024 bytes to output until the jvm crashed. For debugging you might want to just read the entire doc into a byte array and pass that (pretty sure fast api supports an array anywhere an outputstream is accepted); that will allow you to narrow the issue from the reading of the doc into a byte array vs any things the fast api might be doing with the stream once it's passed. – sbaker May 11 '12 at 11:50
0

Since you know how many bytes you are going to be read, you can save time and space by creating the ByteArrayOutputStream with a size. This will save the time and space overheads of "growing" the ByteArrayOutputStream backing storage. (I haven't looked at the code, but it is probably using the same strategy as StringBuilder; i.e. doubling the allocation each time it runs out. That strategy may end up using up to 3 times the file size at peak usage.)

(And frankly, putting the output into a ByteArrayOutputStream when you know the size seems somewhat pointless. Just allocate byte array big enough and read directly into that.)

Apart from that, the answer is that you need to make the heap bigger.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • file size is not constant, its dynamic, as it will change while reading the next file. I have tried to increase the size of stream to that of file size(file.length) but still same problem. – Anshu Kunal May 11 '12 at 06:01
  • If this is a file, you can get the file size using `file.length()`. In fact, the original version of your code did exactly that. If you are still getting OOM exceptions, then you just have to increase the heap size until the heap is big enough to hold the entire file. (Alternatively, change the rest of the program so that you don't need to hold the file in memory ...) – Stephen C May 11 '12 at 07:24
0

Your approach is going to use at least the same ammount of memory as the file, but because ByteArrayOutputStream is using a byte array as storage, it'll potentially have to resize itself 150,000 times (150 meg/1024k buffer) which is not efficient. Upping the heap size to 2* your file size and increasing the size of buf to something much larger may allow it to run, but as other posters have said, it's far better to read form the file as you go, rather than read it in as a String.

barnyr
  • 5,678
  • 21
  • 28
  • 1
    The exact worst-case requirement is 3*size -- there's a moment where both the old and the new arrays must coexist -- and 2*size must be available as a contiguous chunk of heap. So this may fail even when there's still a lot of free memory. – Marko Topolnik May 10 '12 at 12:11
  • No - not correct. The worst-case is worse than you suppose. I don't know the java memory allocator, but when you allocate a buffer, I suppose that it does not simply allocate enough memory to be "exactly" the right size. Often allocators return chunks that are a power of 2, or some other chunky size. So the worst-case scenario may be 5X or more. You cannot be sure. In any case your conclusion is correct - it will still be subject to OOME no matter what you do to the heapsize. Streaming is the solution. – Cheeso May 10 '12 at 16:16
0

I have seen similar issues in C# in Windows cause by not having enough Contiguous Virtual Memory on the host. If your on Windows, you can try increasing the VM space.

Mike
  • 3,186
  • 3
  • 26
  • 32