How to convert Reactor Flux to InputStream

Question

Given that I have a Flux<String> of unknown size, how can I convert it into InputStream that other library is expecting?

For example with WebClient I can achieve that using this approach

WebClient.get('example.com').exchange.flatMap { it.bodyToMono(InputStreamResource::class.java) }.map { it.inputStream }

but I can't figure out how to do the same when I have Flux<String> as an input?

Where are you getting the Flux from? You can start by looking at DataBufferUtils which can read a resource into a DataBuffer and into a InputStream. — Kevin Hussey, Aug 07 '18 at 05:10
I got it from external queue and some processing afterwards. Yeah, I've seen DataBufferUtils, but wasn't able to figure out how can I convert Flux to DataBuffer and then to InputStream. Do you have an example? Thanks — Artem Yarulin, Aug 08 '18 at 06:50
Is the signature of the API InputStream or Flux ? - can you expand your sample to include full code? — Kevin Hussey, Aug 08 '18 at 08:32
You can have a look here for tips, https://github.com/entzik/reactive-spring-boot-examples/blob/master/src/main/java/com/thekirschners/springbootsamples/reactiveupload/ReactiveUploadResource.java - but generally reading from an InputStream is blocking/pulling data, while rx is more pushing data downstream. — Kevin Hussey, Aug 08 '18 at 08:46

Edwin Dalorzo · Accepted Answer · 2018-08-24T16:32:29.103

There are probably many ways to do this. One possibility is to use PipedInputStream and PipedOutputStream.

The way this works is that you link an output stream to an input stream such that everything you write to the output stream can be read from the linked input stream, by doing this, creating a pipe between the two of them.

PipedInputStream in = new PipedInputStream();
PipedOutputStream out = PipedOutputStream(in);

There is one caveat, though, according to the documentation of piped streams, the writing process and the reading process must be happening on separate threads, otherwise we may cause a deadlock.

So, coming back to our reactive stream scenario, we can create a pipeline (as mentioned above) and subscribe to the Flux object and the data you get from it you write it to a piped output stream. Whatever you write there, will be available for reading at the other side of the pipe, in the corresponding input stream. This input stream is the one you can share with your non-reactive method.

We just have to be extra careful that we subscribe to the Flux on a separate thread, .e.g. subscribeOn(Schedulers.elastic()).

Here's a very basic implementation of such subscriber:

class PipedStreamSubscriber extends BaseSubscriber<byte[]> {

    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    private final PipedInputStream in;
    private PipedOutputStream out;

    PipedStreamSubscriber(PipedInputStream in) {
        Objects.requireNonNull(in, "The input stream must not be null");
        this.in = in;
    }

    @Override
    protected void hookOnSubscribe(Subscription subscription) {
        //change if you want to control back-pressure
        super.hookOnSubscribe(subscription);
        try {
            this.out = new PipedOutputStream(in);
        } catch (IOException e) {
            //TODO throw a contextual exception here
            throw new RuntimeException(e);
        }
    }

    @Override
    protected void hookOnNext(byte[] payload) {
        try {
            out.write(payload);
        } catch (IOException e) {
            //TODO throw a contextual exception here
            throw new RuntimeException(e);
        }
    }

    @Override
    protected void hookOnComplete() {
        close();
    }

    @Override
    protected void hookOnError(Throwable error) {
        //TODO handle the error or at least log it
        logger.error("Failure processing stream", error);
        close();
    }

    @Override
    protected void hookOnCancel() {
        close();
    }

    private void close() {
        try {
            if (out != null) {
                out.close();
            }
        } catch (IOException e) {
            //probably just ignore this one or simply  log it
        }
    }
}

And using this subscriber I could define a very simple utility method that turned a Flux<byte[] into an InputStream, somewhat as follows:

static InputStream createInputStream(Flux<byte[]> flux) {

    PipedInputStream in = new PipedInputStream();
    flux.subscribeOn(Schedulers.elastic())
        .subscribe(new PipedStreamSubscriber(in));

    return in;
}

Notice that I was extra careful to close the output stream when the flow is done, when error occurs or the subscription is cancelled, otherwise we run the risk of blocking in the read side, waiting for more input to arrive. Closing the output stream is what signals the end of the input stream at the other side of the pipe.

And now that InputStream can be consumed just as any regular stream and therefore you could pass it around to your non-reactive method, e.g.

Flux<byte[]> jedi = Flux.just("Luke\n", "Obi-Wan\n", "Yoda\n").map(String::getBytes);

try (InputStream in = createInputStream(jedi)) {
    byte[] data = new byte[5];
    int size = 0;
    while ((size = in.read(data)) > 0) {
        System.out.printf("%s", new String(data, 0, size));
    }
}

The code above yields:

Luke
Obi-Wan
Yoda

Could you add some information about memory consumption and delay? For instance if I were to use this solution on a 100MB file, would the file be loaded entirely in memory? Or would your `System.out.printf` start outputting the beginning as soon as the flux starts? Thanks! — Nicolas Raoul, Feb 13 '19 at 07:12

score 2 · Answer 2 · answered Sep 26 '20 at 20:47

The Edwin's answer didn't do a trick for me as the errors in upstream got swallowed by subscriber and did not propagate to consumer of the InputStream. Still, inspired by Edwin's answer I found different solution. Here is an example of consuming Flux<ByteArray> and passing it as InputStream downstream. The example includes decrypting to highlight the possibility of manipulating the OutputStream even after the Flux<ByteStream> was completely consumed, eventually producing an error that gets propagated downstream.

fun decryptAndGetInputStream(flux: Flux<ByteArray>, cipher: Cipher): Flux<InputStream> {
    val inputStream = PipedInputStream()
    val outputStream = PipedOutputStream(inputStream)
    val isStreamEmitted = AtomicBoolean(false)
    
    return flux.handle<InputStream> { byteArray, sink ->
        try {
            outputStream.write(cipher.update(byteArray))
            // emit the input stream as soon as we get the first chunk of bytes
            // make sure we do it only once
            if (!isStreamEmitted.getAndSet(true)) {
                sink.next(inputStream)
            }
        } catch (e: Exception) {
            // catch all errors to pass them to the sink
            sink.error(e)
        }
    }.doOnComplete { 
        // here we have a last chance to throw an error  
        outputStream.write(cipher.doFinal())
    }.doOnTerminate {
        // error thrown here won't get propagated downstream
        // since this callback is triggered after flux's completion 
        outputStream.flush()
        outputStream.close()
    }
}

The catch here is to use the handle operator to produce a Flux that emits at most one item. Unlike Mono the Flux won't get terminated immediately after the first emission. Although its is not going to emit anymore items, it stays "open" to emit eventual error that occurs after the first emission.

Here follows an example of consuming of the Flux<InputStream> and transforming it into Mono.

fun decryptAndGetProcessingResult(flux: Flux<ByteArray>, cipher: Cipher): Mono<Result> =
    decryptAndGetInputStream(flux, cipher)
        // the following operator gets called at most once
        .flatMap { inputStream ->
            // wrap the blocking operation into mono
            // subscribed on another thread to avoid deadlocks
            Mono.fromCallable { 
                processInputStream(inputStream)
            }.subscribeOn(Schedulers.elastic())
        // to get mono out of flux we implement reduce operator
        // although it gets never called
        }.reduce { t, _ -> t }

Another advantage here is that the thread consuming the InputStream is not going to block until the first chunk of data is available.

MuratOzkan · Answer 3 · 2018-08-08T17:28:27.247

You can convert the Flux<String> of known size into a Mono<byte[]> which in turn can be used to form a InputStream. Check this out (in Java):

Flux<String> stringFlux = ...;
stringFlux.collect(() -> new ByteArrayOutputStream(),
                   (baos, str) -> {
                       try {
                           baos.write(str.getBytes());
                       } catch (IOException e) {
                           // do nothing
                       }
                   })
          .map(baos -> new ByteArrayInputStream(baos.toByteArray()))
          .map(inputStream -> ... // call other library);

This requires a cold Flux<T> as the collect() will be run when the Flux is completed. For a Flux<T> of unknown size (and assuming every String is a standalone object), it gets even simpler:

Flux<String> stringFlux = ...;
stringFlux.map(str -> new ByteArrayInputStream(str.getBytes()))
          .map(inputStream -> ... // call other library);

Flux.collect will only run on complete. If it's a hot publisher this won't really work. As you do not know when a complete is called, since it is reacting to an incoming JMS — Kevin Hussey, Aug 08 '18 at 10:47
Didn't see that you were using a queue. Then, will every string coming from the Flux be an InputStream to be passed to this library? — MuratOzkan, Aug 08 '18 at 11:56

score 0 · Answer 4 · answered Apr 26 '19 at 14:56

You can reduce Flux<DataBuffer> to Mono<DataBuffer>, then transfer to InputStream.

Example code about uploading file to GridFs in the WebFlux:

    private GridFsTemplate gridFsTemplate;

    public Mono<String> storeFile(FilePart filePart) {
        HttpHeaders headers = filePart.headers();
        String contentType = Objects.requireNonNull(headers.getContentType()).toString();

        return filePart.content()
                .reduce(DataBuffer::write).map(DataBuffer::asInputStream)
                .map(input -> gridFsTemplate.store(input, filePart.filename(), contentType))
                .map(ObjectId::toHexString);
    }

It is not memory efficient I guess as you put everything into memory — navid_gh, Aug 13 '21 at 15:26

How to convert Reactor Flux to InputStream

4 Answers4

Linked