passing an Akka stream to an upstream service to populate

Question

I need to call an upstream service (Azure Blob Service) to push data to an OutputStream, which then i need to turn around and push it back to the client, thru akka. Without akka (and just servlet code), i'd just get the ServletOutputStream and pass it to the azure service's method.

The closest i can try to stumble upon, and clearly this is wrong, is something like this

        Source<ByteString, OutputStream> source = StreamConverters.asOutputStream().mapMaterializedValue(os -> {
            blobClient.download(os);
            return os;
        });

        ResponseEntity resposeEntity = HttpEntities.create(ContentTypes.APPLICATION_OCTET_STREAM, preAuthData.getFileSize(), source);

        sender().tell(new RequestResult(resposeEntity, StatusCodes.OK), self());

The idea is i'm calling an upstream service to get an outputstream populated by calling blobClient.download(os);

It seems like the the lambda function gets called and returns, but then afterwards it fails, because there's no data or something. As if i'm not supposed to be have that lambda function do the work, but perhaps return some object that does the work? Not sure.

How does one do this?

What is the behaviour of `download`? Does it stream data into `os` and only return once data is done being written in? — Alec, Apr 28 '20 at 02:49

Alec · Answer 1 · 2020-04-30T11:54:44.127

The real issue here is that the Azure API is not designed for back-pressuring. There is no way for the output stream to signal back to Azure that it is not ready for more data. To put it another way: if Azure pushes data faster than you are able to consume it, there will have to be some ugly buffer overflow failure somewhere.

Accepting this fact, the next best thing we can do is:

Use Source.lazySource to only start downloading data when there is downstream demand (aka. the source is being run and data is being requested).
Put the download call in some other thread so that it continues executing without blocking the source from being returned. Once way to do this is with a Future (I'm not sure what Java best practices are, but should work fine either way). Although it won't matter initially, you may need to choose an execution context other than system.dispatcher - it all depends on whether download is blocking or not.

I apologize in advance if this Java code is malformed - I use Akka with Scala, so this is all from looking at the Akka Java API and Java syntax reference.

ResponseEntity responseEntity = HttpEntities.create(
  ContentTypes.APPLICATION_OCTET_STREAM,
  preAuthData.getFileSize(),

  // Wait until there is downstream demand to intialize the source...
  Source.lazySource(() -> {
    // Pre-materialize the outputstream before the source starts running
    Pair<OutputStream, Source<ByteString, NotUsed>> pair =
      StreamConverters.asOutputStream().preMaterialize(system);

    // Start writing into the download stream in a separate thread
    Futures.future(() -> { blobClient.download(pair.first()); return pair.first(); }, system.getDispatcher());

    // Return the source - it should start running since `lazySource` indicated demand
    return pair.second();
  })
);

sender().tell(new RequestResult(responseEntity, StatusCodes.OK), self());

Fantastic. thanks much. A small edit to your example is: Futures.future(() -> { blobClient.download(pair.first()); return pair.first(); }, system.getDispatcher()); — MeBigFatGuy, Apr 30 '20 at 05:05

score 0 · Answer 2 · answered Apr 15 '20 at 12:33

The OutputStream in this case is the "materialized value" of the Source and it will only be created once the stream is run (or "materialized" into a running stream). Running it is out of your control since you hand the Source to Akka HTTP and that will later actually run your source.

.mapMaterializedValue(matval -> ...) is usually used to transform the materialized value but since it is invoked as a part of materialization you can use that to do side effects such as sending the matval in a message, just like you have figured out, there isn't necessarily anything wrong with that even if it looks funky. It is important to understand that the stream will not complete its materialization and become running until that lambda completes. This means problems if download() is blocking rather than forking off some work on a different thread and immediately returning.

There is however another solution: Source.preMaterialize(), it materializes the source and gives you a Pair of the materialized value and a new Source that can be used to consume the already started source:

Pair<OutputStream, Source<ByteString, NotUsed>> pair = 
  StreamConverters.asOutputStream().preMaterialize(system);
OutputStream os = pair.first();
Source<ByteString, NotUsed> source = pair.second();

Note that there are a few additional things to think of in your code, most importantly if the blobClient.download(os) call blocks until it is done and you call that from the actor, in that case you must make sure that your actor does not starve the dispatcher and stop other actors in your application from executing (see Akka docs: https://doc.akka.io/docs/akka/current/typed/dispatchers.html#blocking-needs-careful-management ).

Thanks for the response. I don't see how this could possibly work? where do the bytes go when blobClient.download(os) is called (if i am calling it myself)? Imagine there's a terabyte of data sitting waiting to be written. it seems to me that the blobClient.download call has to be invoked from the sender.tell call so that this is basically an IOUtils.copy-like operation.. Using preMaterialize i can't see how that happens? — MeBigFatGuy, Apr 15 '20 at 17:21
The OutputStream has an internal buffer, it will start accepting writes until that buffer fills up, if the async downstream has not started consuming elements by then it will block the writing thread (which is why I mentioned that it is important to handle blocking). — johanandren, Apr 16 '20 at 09:20
But if i preMaterialize, and get the OutputStream, then it is my code that is doing the blobClient.download(os); correct? That means it has to complete before i can proceed, which is impossible. — MeBigFatGuy, Apr 16 '20 at 12:23
If download(os) does not fork of a thread, you will have to deal with it being blocking and make sure that doesn't stop some other operation. One way would be to fork a thread to do the work, another would be responding from the actor first and then do the blocking work there, in that case you must make sure the actor does not starve other actors, see the link in end of my answer. — johanandren, Apr 17 '20 at 07:02
at this point i'm just trying to get it to work at all. It can't even process a 10 byte file. — MeBigFatGuy, Apr 17 '20 at 22:04

passing an Akka stream to an upstream service to populate

2 Answers2