-1

I've observed that many libraries only support stream processing in one direction, such as compression, encryption, etc.

GZIPInputStream/GZIPOutputStream makes the assumption that you will only inflate-on-read (wrapping an InputStream) and deflate-on-write (wrapping an OutputStream).

Likewise, many examples of PGP via BouncyCastle only perform decrypt-on-read (again, wrapping an InputStream) and encrypt-on-write (again, wrapping an OutputStream)

Basing off a mix of examples I've encountered across SO and elsewhere, I'm trying to write a set of generic utility functions, which will allow (probably via piping with PipedInputStream/PipedOutputStream) the ability to support bi-directional stream processing, such as with GZIP:

  • Inflate on read (supported out-of-the-box)
  • Deflate on write (supported out-of-the-box)
  • Inflate on write (missing; example use-case of inflating compressed data on-the-fly from a socket to disk)
  • Deflate on read (missing; example use-case of deflating uncompressed data on-the-fly from disk to a socket)

For the case of running an InputStream through OutputStream processing (deflate on write), the following code works:

(see also: IOUtils)

static InputStream outputTransformingInputStream(InputStream inputStream, Function<OutputStream, OutputStream> transformer) {
    try {
        var pipedInputStream = new PipedInputStream();
        var pipedOutputStream = new PipedOutputStream(pipedInputStream);
        var transformerThread = new Thread(() -> {
            try (var transformerOutputStream = transformer.apply(pipedOutputStream)) {
                IOUtils.copy(inputStream, transformerOutputStream);
                transformerOutputStream.flush();
            } 
            catch (IOException exception) {
                throw new RuntimeException(exception);
            }
        });
        transformerThread.start();
        return pipedInputStream;
    }
    catch (IOException exception) {
        throw new RuntimeException(exception);
    }
}

Used with, for example:

// Deflating when reading from an InputStream
var inflatedInputStream = new FileInputStream("/path/to/file.txt");
var deflatingInputStream = outputTransformingInputStream(inflatedInputStream, GZIPOutputStream::new);
var deflatedBytes = deflatingInputStream.readAllBytes(); // Works 

The inverse approach isn't working (resulting in an empty byte[]) and perhaps it's just because I've been staring at streams and pipes too long and I've got the order out of whack:

(see also: IOUtils)

static OutputStream inputTransformingOutputStream(OutputStream outputStream, Function<InputStream, InputStream> transformer) {
    try {
        var pipedOutputStream = new PipedOutputStream();
        var pipedInputStream = new PipedInputStream(pipedOutputStream);
        var transformerThread = new Thread(() -> {
            try (var transformingInputStream = transformer.apply(pipedInputStream)) {
                IOUtils.copy(transformingInputStream, outputStream);
                outputStream.flush();
            }
            catch (IOException exception) {
                throw new RuntimeException(exception);
            }
        });
        transformerThread.start();
        return pipedOutputStream;
    }
    catch (IOException exception) {
        throw new RuntimeException(exception);
    }
}

And this example invocation is resulting in an empty inflatedBytes.

// Inflating when writing to an OutputStream
var inflatedOutputStream = new ByteArrayOutputStream();
var inflatingOutputStream = StreamFactory.inputTransformingOutputStream(inflatedOutputStream, GZIPInputStream::new);
inflatingOutputStream.write(deflatedBytes); // deflatedBytes previously deflated, see above
inflatingOutputStream.flush();
var inflatedBytes = inflatedOutputStream.toByteArray(); // Empty 

So, questions(s):

  • Are the PipedInputStream/PipedOutputStream utility method examples the best approach for genericizing this functionality?
  • If so, can someone tell me where I went wrong in the inputTransformingOutputStream function?
  • If not, what would be a better approach to create generic support for bi-directional stream processing?

So my continued mention of GZIP and inflate/deflate probably looks like that's specifically what I'm trying to do.

More generically, given libraries Foo and Bar that do unidirectional transformations of stream data:

interface FooService {
     InputStream enfooify(InputStream inputStream);
     OutputStream defooify(OutputStream outputStream);
}

interface BarService {
    InputStream enbarify(InputStream inputStream);
    OutputStream debarify(OutputStream outputStream);
}

I'm trying to create a generic solution to:

Less about GZIP, or PGP, or anything in particular, and more about supporting generic bi-directional stream data transformation when a library doesn't already.

Dan Lugg
  • 20,192
  • 19
  • 110
  • 174
  • Voting to close because duplicate? X/Y? Too broad? – Dan Lugg Apr 05 '23 at 20:51
  • An example that reproduces the issue would help. Can you provide the definition of IOUtils.copy and initialise the inflatedInputStream. – jon hanson Apr 05 '23 at 20:58
  • Possibly related: https://stackoverflow.com/questions/66771680/javas-gzipoutputstream-refuses-to-flush (note that I am the author of the answer to that question, though it is not necessarily the problem that you observe). – Thomas Kläger Apr 05 '23 at 21:50
  • @jon-hanson -- Updated the question accordingly; `inflatedInputStream` is just read off disk, or from memory (*the former in this example*), and `IOUtils` is from Apache Commons which I've linked. – Dan Lugg Apr 05 '23 at 21:59
  • @DanLugg Your current implementation is unsafe/unreliable as you have no idea or control over whether the background thread is up to date with the last read/write/flush operation. You were lucky the first part worked, just add `Thread.sleep(1000)` before `var inflatedBytes = inflatedOutputStream.toByteArray();` and it will probably fix the second case. – DuncG Apr 06 '23 at 14:37
  • @DuncG You got it; alright, I'm going to take a different approach to this. – Dan Lugg Apr 06 '23 at 17:31
  • @DanLugg You would be able to make it work on close stream by overriding Pipe classes and make `close()` join on the thread to ensure final state is correct, though not possible to handle `flush()` – DuncG Apr 06 '23 at 17:37

1 Answers1

2

while I can share your observation, I do not see an issue here. Let's look at the two missing examples:

  • Inflate on write (missing; example use-case of inflating compressed data on-the-fly from a socket to disk)
  • Deflate on read (missing; example use-case of deflating uncompressed data on-the-fly from disk to a socket)

In both cases you need to read data, process (inflate/deflate) and write it. For these three steps it is of little effect whether you move the processing to the left side (read and inflate/deflate) or the right side (inflate/deflate and write). For the same reason it also is of little effect is one of them is left and the other one right.

You can definitely build the use cases you mentioned with the existing classes.

Queeg
  • 7,748
  • 1
  • 16
  • 42
  • Thank you @Queeg -- So, focusing on the second example, deflate on read (*or perhaps encrypt on read, for the BC/PGP example I also provided*) the primary use case here are other APIs that expect an `InputStream` (*looking at AWS SDK*) that is used as the source from which to write to an external resource (*reading off your provided `InputStream` to persist a file/object*). If the provided wrapped `InputStream` would deflate/encrypt on the fly, I can avoid creating temp files to which I'd otherwise need to write the entire deflated/encrypted payload, and then create an `InputStream` from. – Dan Lugg Apr 05 '23 at 22:05
  • ... The issue of course being, that I'm presented with an wrapped `OutputStream` that will deflate/encrypt on write out-of-the-box in the GZIP and BC/PGP cases. Was there an obvious issue in my `inputTransformingOutputStream` implementation? As mentioned, `outputTransformingInputStream` seems to be working just fine. – Dan Lugg Apr 05 '23 at 22:06
  • Maybe what you are really searching for is a way to chain operations, such as you can do on the Un*x command line with pipe characters: Inputstream->Process->OutputStream->InputStream->Process->Outputstream... So all you might need is an OutputStream that can provide the data as an InputStream. Whether it uses the disk or some other control mechanism (such as block the write operation) is up to the implementation. – Queeg Apr 06 '23 at 10:23