4

I'm currently trying to write a custom streams proxy (let's call it in that way) that can change the content from the given input stream and produce a modified, if necessary, output. This requirement is really necessary because sometimes I have to modify the streams in my application (e.g. compress the data truly on the fly). The following class is pretty easy and it uses internal buffering.

private static class ProxyInputStream extends InputStream {

    private final InputStream iStream;
    private final byte[] iBuffer = new byte[512];

    private int iBufferedBytes;

    private final ByteArrayOutputStream oBufferStream;
    private final OutputStream oStream;

    private byte[] oBuffer = emptyPrimitiveByteArray;
    private int oBufferIndex;

    ProxyInputStream(InputStream iStream, IFunction<OutputStream, ByteArrayOutputStream> oStreamFactory) {
        this.iStream = iStream;
        oBufferStream = new ByteArrayOutputStream(512);
        oStream = oStreamFactory.evaluate(oBufferStream);
    }

    @Override
    public int read() throws IOException {
        if ( oBufferIndex == oBuffer.length ) {
            iBufferedBytes = iStream.read(iBuffer);
            if ( iBufferedBytes == -1 ) {
                return -1;
            }
            oBufferIndex = 0;
            oStream.write(iBuffer, 0, iBufferedBytes);
            oStream.flush();
            oBuffer = oBufferStream.toByteArray();
            oBufferStream.reset();
        }
        return oBuffer[oBufferIndex++];
    }

}

Let's assume we also have a sample test output stream that simply adds a space character before every written byte ("abc" -> " a b c") like this:

private static class SpacingOutputStream extends OutputStream {

    private final OutputStream outputStream;

    SpacingOutputStream(OutputStream outputStream) {
        this.outputStream = outputStream;
    }

    @Override
    public void write(int b) throws IOException {
        outputStream.write(' ');
        outputStream.write(b);
    }

}

And the following test method:

private static void test(final boolean useDeflater) throws IOException {
    final FileInputStream input = new FileInputStream(SOURCE);
    final IFunction<OutputStream, ByteArrayOutputStream> outputFactory = new IFunction<OutputStream, ByteArrayOutputStream>() {
        @Override
        public OutputStream evaluate(ByteArrayOutputStream outputStream) {
            return useDeflater ? new DeflaterOutputStream(outputStream) : new SpacingOutputStream(outputStream);
        }
    };
    final InputStream proxyInput = new ProxyInputStream(input, outputFactory);
    final OutputStream output = new FileOutputStream(SOURCE + ".~" + useDeflater);
    int c;
    while ( (c = proxyInput.read()) != -1 ) {
        output.write(c);
    }
    output.close();
    proxyInput.close();
}

This test method simply reads the file content and writes it to another stream, that's probably can be modified somehow. If the test method is running with useDeflater=false, the expected approach works fine as it's expected. But if the test method is invoked with the useDeflater set on, it behaves really strange and simply writes almost nothing (if omit the header 78 9C). I suspect that the deflater class may not be designed to meet the approach I like to use, but I always believed that ZIP format and the deflate compression are designed to work on-fly.

Probably I'm wrong at some point with the specifics of the deflate compression algorithm. What do I really miss?.. Perhaps there could be another approach to write a "streams proxy" to behave exactly as I want it to work... How can I compress the data on the fly being limited with the streams only?

Thanks in advance.


UPD: The following basic version works pretty nice with deflater and inflater:

public final class ProxyInputStream<OS extends OutputStream> extends InputStream {

private static final int INPUT_BUFFER_SIZE = 512;
private static final int OUTPUT_BUFFER_SIZE = 512;

private final InputStream iStream;
private final byte[] iBuffer = new byte[INPUT_BUFFER_SIZE];
private final ByteArrayOutputStream oBufferStream;
private final OS oStream;
private final IProxyInputStreamListener<OS> listener;

private byte[] oBuffer = emptyPrimitiveByteArray;
private int oBufferIndex;
private boolean endOfStream;

private ProxyInputStream(InputStream iStream, IFunction<OS, ByteArrayOutputStream> oStreamFactory, IProxyInputStreamListener<OS> listener) {
    this.iStream = iStream;
    oBufferStream = new ByteArrayOutputStream(OUTPUT_BUFFER_SIZE);
    oStream = oStreamFactory.evaluate(oBufferStream);
    this.listener = listener;
}

public static <OS extends OutputStream> ProxyInputStream<OS> proxyInputStream(InputStream iStream, IFunction<OS, ByteArrayOutputStream> oStreamFactory, IProxyInputStreamListener<OS> listener) {
    return new ProxyInputStream<OS>(iStream, oStreamFactory, listener);
}

@Override
public int read() throws IOException {
    if ( oBufferIndex == oBuffer.length ) {
        if ( endOfStream ) {
            return -1;
        } else {
            oBufferIndex = 0;
            do {
                final int iBufferedBytes = iStream.read(iBuffer);
                if ( iBufferedBytes == -1 ) {
                    if ( listener != null ) {
                        listener.afterEndOfStream(oStream);
                    }
                    endOfStream = true;
                    break;
                }
                oStream.write(iBuffer, 0, iBufferedBytes);
                oStream.flush();
            } while ( oBufferStream.size() == 0 );
            oBuffer = oBufferStream.toByteArray();
            oBufferStream.reset();
        }
    }
    return !endOfStream || oBuffer.length != 0 ? (int) oBuffer[oBufferIndex++] & 0xFF : -1;
}

}

Lyubomyr Shaydariv
  • 20,327
  • 12
  • 64
  • 105
  • 1
    I'm a little lost. But I should simple use the original `outputStream` when I don't want to compress and `new GZipOutputStream(outputStream)` when I DO want to compress. That's all. Anyway, check you are flushing the output streams. – helios Jan 17 '12 at 16:54
  • `ByteArrayOutputStream` != `BufferedOutputStream`. Very much so. – Viruzzo Jan 17 '12 at 16:55

3 Answers3

4

I don't believe that DeflaterOutputStream.flush() does anything meaningful. the deflater will accumulate data until it has something to write out to the underlying stream. the only way to force the remaining bit of data out is to call DeflaterOutputStream.finish(). however, this would not work for your current implementation, as you can't call finish until you are entirely done writing.

it's actually very difficult to write a compressed stream and read it within the same thread. In the RMIIO project i actually do this, but you need an arbitrarily sized intermediate output buffer (and you basically need to push data in until something comes out compressed on the other end, then you can read it). You might be able to use some of the util classes in that project to accomplish what you want to do.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • "you basically need to push data in until something comes out compressed on the other end" this is one of the biggest problems (unless you can afford to compress the entire content at once); an inefficient (but simple) solution is to compress data in discrete "packets", provided you are able to do the decompression. – Viruzzo Jan 17 '12 at 18:36
  • Just added the listener with `void afterFlush(O outputStream) throws IOException;` to that code sample. And finally it compressed the "lorem ipsum" text sample. Thank you for the pointing to `.finish()`. :) – Lyubomyr Shaydariv Jan 18 '12 at 11:00
  • @LyubomyrShaydariv - you realize, though, that once you call finish, your compressed stream is done. you won't ever be able to handle more than 512 bytes of compressed data. your current code is not really a "general" solution. – jtahlborn Jan 18 '12 at 12:57
  • Yes, that was the first thing I thought was "yeah, it works". However, it only worked with 446-length `Lorem ipsum...` text and so on. When I doubled the text to compress (892 b) - it failed with the reason you mentioned. Anyway, we reworked today the `read()` method completely and finally it works on-the-fly both for deflater and inflater. I just said thank you for pointing the `finish()` method that now I call after end of input stream. Also the source code in the question misses the fact that -1 in the result byte array buffer is treated as the end of stream and not the actual `0xFF` data. – Lyubomyr Shaydariv Jan 18 '12 at 18:26
3

Why don't use GZipOutputStream?

I'm a little lost. But I should simple use the original outputStream when I don't want to compress and new GZipOutputStream(outputStream) when I DO want to compress. That's all. Anyway, check you are flushing the output streams.

Gzip vs zip

Also: one thing is GZIP (compress a stream, that's what you're doing) and another thing is writing a valid zip file (file headers, file directory, entries (header,data)*). Check ZipOutputStream.

helios
  • 13,574
  • 2
  • 45
  • 55
  • Thank you for reply. Using of GZipOutputStream has no effect as well as ZipOutputStream has. I simply get the output stream completely trimmed: "Lorem ipsum ..." from 446 become two bytes I mentioned in the question. I cannot use OutputStream directly because the requirement is to get an InputStream to be delegated to JDBC prepared statement (because of possibly huge incoming data). That's why I'm looking for a class to be a proxy like MyApp(inputStream) --> [compressor] --> JDBC(inputStream). – Lyubomyr Shaydariv Jan 17 '12 at 17:18
1

Be careful, if somewhere you use method int read(byte b[], int off, int len) and in case of exception in line final int iBufferedBytes = iStream.read(iBuffer);

you will get stuck in infinite loop