8

I am looking for a way to easily reuse akka-stream flows.

I treat the Flow I intend to reuse as a function, so I would like to keep its signature like:

Flow[Input, Output, NotUsed]

Now when I use this flow I would like to be able to 'call' this flow and keep the result aside for further processing.

So I want to start with Flow emiting [Input], apply my flow, and proceed with Flow emitting [(Input, Output)].

example:

val s: Source[Int, NotUsed] = Source(1 to 10)

val stringIfEven = Flow[Int].filter(_ % 2 == 0).map(_.toString)

val via: Source[(Int, String), NotUsed] = ???

Now this is not possible in a straightforward way because combining flow with .via() would give me Flow emitting just [Output]

val via: Source[String, NotUsed] = s.via(stringIfEven)

Alternative is to make my reusable flow emit [(Input, Output)] but this requires every flow to push its input through all the stages and make my code look bad.

So I came up with a combiner like this:

def tupledFlow[In,Out](flow: Flow[In, Out, _]):Flow[In, (In,Out), NotUsed] = {
  Flow.fromGraph(GraphDSL.create() { implicit b =>
  import GraphDSL.Implicits._

  val broadcast = b.add(Broadcast[In](2))
  val zip = b.add(Zip[In, Out])

  broadcast.out(0) ~> zip.in0
  broadcast.out(1) ~> flow ~> zip.in1

  FlowShape(broadcast.in, zip.out)
})

}

that is broadcasting the input to the flow and as well in a parallel line directly -> both to the 'Zip' stage where I join values into a tuple. It then can be elegantly applied:

val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEven))

Everything great but when given flow is doing a 'filter' operation - this combiner is stuck and stops processing further events.

I guess that is due to 'Zip' behaviour that requires all subflows to do the same - in my case one branch is passing given object directly so another subflow cannot ignore this element with. filter(), and since it does - the flow stops because Zip is waiting for push.

Is there a better way to achieve flow composition? Is there anything I can do in my tupledFlow to get desired behaviour when 'flow' ignores elements with 'filter' ?

  • The main problem of the concept here is that `Flow[T, U, ...]` is not a function. For each input element it may return 0, 1, or more output elements. It may even keep back input elements and use them only later when more data is available. For this reason it is impossible to provide this feature generically if the wrapped flow doesn't support it itself. It can work generically, if it is strictly enforced that the wrapped `Flow` is a one-to-one flow that actually works like a function (but filter doesn't work then). Usually, using `mapAsync` in such cases is a simpler way. – jrudolph Dec 29 '16 at 07:47
  • yes, you're right. Problem would happen if my reusable flow would return N elements. Stating the assumption that the wrapped `Flow` may output 0 or 1 element for every 1 input element would allow to write a different semantic `Zip` operator that would zip with input only if wrapped `Flow` outputs and skip everything if the wrapped `Flow` is not pushing any element. – Tomasz Bartczak Dec 29 '16 at 08:04
  • Even that would be hard to do because the pulling and pushing of the wrapped flow does not happen synchronously. You cannot check if "the wrapped `Flow is not pushing any element" - it could just be slow or buffered, etc. – jrudolph Dec 30 '16 at 10:42

2 Answers2

3

Two possible approaches - with debatable elegance - are:

1) avoid using filtering stages, mutating your filter into a Flow[Int, Option[Int], NotUsed]. This way you can apply your zipping wrapper around your whole graph, as was your original plan. However, the code looks more tainted, and there is added overhead by passing around Nones.

val stringIfEvenOrNone = Flow[Int].map{
  case x if x % 2 == 0 => Some(x.toString)
  case _ => None
}

val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEvenOrNone)).collect{
  case (num, Some(str)) => (num,str)
}

2) separate the filtering and transforming stages, and apply the filtering ones before your zipping wrapper. Probably a more lightweight and better compromise.

val filterEven = Flow[Int].filter(_ % 2 == 0)

val toString = Flow[Int].map(_.toString)

val tupled: Source[(Int, String), NotUsed] = s.via(filterEven).via(tupledFlow(toString))

EDIT

3) Posting another solution here for clarity, as per the discussions in the comments.

This flow wrapper allows to emit each element from a given flow, paired with the original input element that generated it. It works for any kind of inner flow (emitting 0, 1 or more elements for each input).

  def tupledFlow[In,Out](flow: Flow[In, Out, _]): Flow[In, (In,Out), NotUsed] =
    Flow[In].flatMapConcat(in => Source.single(in).via(flow).map( out => in -> out))
Stefano Bonetti
  • 8,973
  • 1
  • 25
  • 44
  • Yes, option 0) is something considerable less harmful of all. However it is always dangerous when you have an api that is accessible but *shouldn't* be used when called in a `TupledFlow` wrapper. Not what we would expect from a composable pieces of code. – Tomasz Bartczak Dec 29 '16 at 08:08
  • I think that the ultimate solution is a dedicated operator that would behave like I described in http://stackoverflow.com/questions/41366030/elegant-way-of-reusing-akka-stream-flows#comment69952575_41366030 – Tomasz Bartczak Dec 29 '16 at 08:08
  • Agreed, given the strong assumptions, this TupledFlow is definitely not something I would share in - e.g. - a standalone library. But it would still make some sense as an internally reusable graph stage in your project. – Stefano Bonetti Dec 29 '16 at 08:25
  • We tried to progress with using 3) as an option but it turns out that if the `wrappedFlow` has e.g. some `mapAsync()` steps to improve parallelism of given step - it does not have any effect, since we are calling this flow for a `Source.single()`. So this wrapper is nice but will limit the parallelizm. – Tomasz Bartczak Jan 05 '17 at 14:26
  • I guess we need `flatMapConcatAsync` on the Flow with API and semantics same as flatMapConcat but with controlled parallelism – Tomasz Bartczak Jan 05 '17 at 14:34
1

I came up with an implementation of TupledFlow that works when wrapped Flow uses filter() or mapAsync() and when wrapped Flow emits 0,1 or N elements for every input:

   def tupledFlow[In,Out](flow: Flow[In, Out, _])(implicit materializer: Materializer, executionContext: ExecutionContext):Flow[In, (In,Out), NotUsed] = {
  val v:Flow[In, Seq[(In, Out)], NotUsed]  = Flow[In].mapAsync(4) { in: In =>
    val outFuture: Future[Seq[Out]] = Source.single(in).via(flow).runWith(Sink.seq)
    val bothFuture: Future[Seq[(In,Out)]] = outFuture.map( seqOfOut => seqOfOut.map((in,_)) )
    bothFuture
  }
  val onlyDefined: Flow[In, (In, Out), NotUsed] = v.mapConcat[(In, Out)](seq => seq.to[scala.collection.immutable.Iterable])
  onlyDefined
}

the only drawback I see here is that I am instantiating and materializing a flow for a single entity - just to get a notion of 'calling a flow as a function'.

I didn't do any performance tests on that - however since heavy-lifting is done in a wrapped Flow which is executed in a future - I believe this will be ok.

This implementation passes all the tests from https://gist.github.com/kretes/8d5f2925de55b2a274148b69f79e55ac#file-tupledflowspec-scala

  • 2
    If this is what you're after, you could probably get away with `def tupledFlow[In,Out](flow: Flow[In, Out, _]): Flow[In, (In,Out), NotUsed] = { Flow[In].flatMapConcat(in => Source.single(in).via(flow).map( out => in -> out)) }` – Stefano Bonetti Dec 29 '16 at 14:55
  • Yes, I guess that is what I am after. It satisfies every need and it will properly handle every behaviour of wrapped `Flow`. – Tomasz Bartczak Dec 30 '16 at 08:21
  • Your implementation is concise and minimal. Thanks for that. I guess this is the proper answer for my question – Tomasz Bartczak Dec 30 '16 at 08:24