-1

My requirement is to create 2 copies of the inputstream, one for Apache Tika File MimeType Detect and another to Output Stream.

private List<InputStream> copyInputStream(final InputStream pInputStream, final int numberOfCopies) throws UploadServiceException{
    final int bytesSize = 8192;
    List<InputStream> isList = null;        
    try(PushbackInputStream pushIS = new PushbackInputStream(pInputStream);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();){  
        byte[] buffer = new byte[bytesSize];
        for (int length = 0; ((length = pushIS.read(buffer)) > 0);) {
            baos.write(buffer, 0, length);
        }
        baos.flush();
        isList = new ArrayList();
        for(int i = 0; i < numberOfCopies ; i++){
            isList.add(new ByteArrayInputStream(baos.toByteArray()));
        }
    } catch (IOException ex) {
        throw new MyException(ErrorCodeEnum.IO_ERROR, ex);
    } catch (Exception ex) {            
        throw new MyException(ErrorCodeEnum.GENERIC_ERROR, ex);
    }
    return isList;
}

I see some performance issues

  1. The size of the byte array is twice the file size. I planned on using the ByteArrayOutputStream(int size) but I don't have the file size during the upload.
  2. I see the garbage collection is not happening very often, how does one handle such cases.

UPDATE

Based on the feedback

  • Removed PushbackInputStream
  • Added final byte[] byteArrayIS = baos.toByteArray();

    private List<InputStream> copyInputStream(final InputStream pInputStream, final int numberOfCopies) throws MyException{
        final int bytesSize = 8192;
        List<InputStream> isList = null;        
        try(ByteArrayOutputStream baos = new ByteArrayOutputStream();){  
            byte[] buffer = new byte[bytesSize];
            for (int length = 0; ((length = pInputStream.read(buffer)) > 0);) {
                baos.write(buffer, 0, length);
            }
            baos.flush();
            isList = new ArrayList();
            final byte[] byteArrayIS = baos.toByteArray();
            for(int i = 0; i < numberOfCopies ; i++){
                isList.add(new ByteArrayInputStream(byteArrayIS));
            }
        } catch (IOException ex) {
            throw new MyException(ErrorCodeEnum.IO_ERROR, ex);
        } catch (Exception ex) {
            if(ex instanceof MyException){
                throw ex;
            }
            throw new MyException(ErrorCodeEnum.GENERIC_ERROR, ex);
        }
        return isList;
    }
    
Prateek Agarwal
  • 407
  • 1
  • 8
  • 20

2 Answers2

4

The size of the byte array is twice the file size. I planned on using the ByteArrayOutputStream(int size) but I don't have the file size during the upload.

There is not much you can do about that if you have to use ByteArrayOutputStream and don't have a good estimate for the size. The ByteArrayOutputStream uses a simple (and time-efficient) strategy of doubling the byte array size when it fills up.

The Apache Commons IO version of ByteArrayOutputStream uses an alternative strategy that reduces copying, but it still over-allocates memory ... significantly.

I see the garbage collection is not happening very often, how does one handle such cases.

The correct approach is to not handle it. Leave the GC to run when the JVM decides that it is necessary. That is by far the most efficient way to to storage management in Java.

  • Running the GC explicitly using System.gc() can be disastrous for performance.
  • Running the GC (explicitly or letting the JVM doing it) is unlikely to give memory back to the OS.

In fact, the GC not running often is probably a good thing.


Then ... looking at your code ... I can see something that means you will be using many more copies of the data than you need to.

Each time you call toByteArray() you create a new copy of the data captured by the ByteArrayOutputStream. For what you are doing, this is unnecessary. Instead, you should call toByteArray() once create a single byte[] and wrap that single byte[] in multiple ByteArrayInputStream instances. You can be sure that the input streams will not modify the bytes in the byte[].

The use of PushbackInputStream in your example code is doesn't appear to achieve anything ... that can't be achieved better in other ways.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Upvote, but there is no reason to use either `PushbackInputStream` or `BufferedInputStream` that I can see. He's using an 8k buffer and never backtracking. Entire question remains obscure. – user207421 Aug 08 '17 at 10:10
  • Yea ... I realized that when I looked again at the way he was reading ... – Stephen C Aug 08 '17 at 10:12
1

First, why do you use PushbackInputStream? It's completely irrelevant. You might want to wrap InputStream into a BufferedInputStream if inputStream is not already buffered.

Second, how did you measure byte array size? ByteArrayOutputStream manages internal allocation automatically. If baos.toByteArray() is giving you double data, see first how much you actually read from InputStream (hint: sum of all lengths in for loop).

As for garbage collection, it's automatic and nondeterministic, so if you don't know much about it, just leave it alone. In general, less GC activity means that enough memory is available and/or the program is not creating much garbage. That's a good thing! However, you should ensure that all streams are closed once you don't need them anymore, otherwise you'll get a memory leak. In particular, look for where pInputStream is closed, and for where all InputStreams in the resulting list are closed.

jurez
  • 4,436
  • 2
  • 12
  • 20
  • Using InputStream.read(buffer) throws NullPointerException Using PushbackInputStream.read(buffer) throws IOException to fix NullPointerException. – Prateek Agarwal Aug 08 '17 at 09:54
  • @PrateekAgarwal `NullPointerException` is a trivial programming error: either the input stream or the buffer was null. If that's the underlying problem here you need to update your question. – user207421 Aug 08 '17 at 09:56
  • 1
    @PrateekAgarwal - 1) The buffer is not `null`. Look at your code! 2) If it was `null`, then `NullPointerException` is the **correct** exception. Hiding it as an IOException is only going to make it harder to find the bug that lead to the `null`. 3) Why not test for `null` explicitly? – Stephen C Aug 08 '17 at 10:11