4

I have implemented a custom component in akka stream which takes elements as input, groups and merges them based on a key and sends them out through one of a dozen outlets. You can think of this component as a kind of GroupBy component which does not partition the flow into subflows, but actual flows. In addition to partitioning incoming elements, it merges them into one element, i.e. there is some buffering happening inside the component such that 1 element in does not necessarily mean 1 element out through an outlet.

Below is a simplified implementation of said component.

class CustomGroupBy[A,B](k: Int, f: A => Int) extends GraphStage[FlowShape[B, B]] {

  val in = Inlet[A]("CustomGroupBy.in")
  val outs = (0 until k).map(i => Outlet[B](s"CustomGroupBy.$i.out"))

  override val shape = new AmorphousShape(scala.collection.immutable.Seq(in), outs)

  /* ... */
}

I now what to connect each outlet of that component to a different Sink and combine the materialized value of all these sinks.

I have tried a few things with the graph DSL, but have not quite managed to get it working. Would anyone be so kind as to provide me with a snippet to do that or point me in the right direction?

Thanks in advance!

user510159
  • 1,379
  • 14
  • 26
  • 1
    Can you give a brief code example of why the Graph DSL did not work for you? I would say that if you have all your ports connected in the graph, it should work. – hveiga Mar 30 '17 at 14:42

2 Answers2

4

You most likely want the built-in broadcast stage. Example usage can be found here:

val bcast = builder.add(Broadcast[Int](2))

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
            bcast ~> f4 ~> merge
Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125
  • Thanks for your answer! Unfortunately, I do not want the broadcast because I plan to potentially have many outgoing flows and it seems expensive to broadcast and filter for every flow. Is it your experience that it is not expensive? Can you suggest something else? – user510159 Mar 30 '17 at 13:49
  • You are welcome. Unfortunately my only experience has been with `broadcast`. It might be easier to replicate your `Source`, then you could just materialize N streams... – Ramón J Romero y Vigil Mar 30 '17 at 13:52
1

You probably want the akka.stream.scaladsl.Partition[T](outputPorts: Int, partitioner: T ⇒ Int) stage.

EDIT:

To connect all the ports, and keep the materialized values, you have to give your stages as parameters to the GraphDSL.create method.

This allows you to define a combiner for the materialized values, and add the stages to your GraphDSLBuilder, as parameters to the last argument. Note that this overloaded create method does not take a varargs parameter, so it may not be possible to have 14 different stages treated that way.

Assuming some names for your stages, here is how I would implement it, in the case of 3 outputs:

val runnable = RunnableGraph.fromGraph(
  GraphDSL.create(
    source, customGroupBy, sink1, sink2, sink3)(combiner) {  //the combiner is the function to combine the materialized values
      implicit b => //this is the builder, needed as implicit to make the connections 
      (src, cgb, s1, s2, s3) => //here are the stages added to the builder
      import GraphDSL.Implicits._

      src.out ~> cgb.in
      List(s1, s2, s3).map(_.in).zip(cgb.outlets).foreach{
        case (in, out) => in ~> out
      }

      ClosedShape
    }
  )
)

Remember that if you don't need one of the stages' materialized value, you can just add it inside the DSL by doing val cgb = b.add(customGroupBy)

Cyrille Corpet
  • 5,265
  • 1
  • 14
  • 31
  • Yes, that seems like what I want. I have to write my own due the merging I'm doing, though, but as a black box, the component will look exactly like a Partition stage. My problem is indeed connecting the outputs to the sinks and then combining the materialized values from each sink. – user510159 Mar 31 '17 at 07:11
  • Nice! Thank you! Is there a way to somehow combine the sinks before passing them to create or create them inside it to get away from this limitation of 22 maximum parameters? – user510159 Mar 31 '17 at 11:37
  • Let me clarify my previous question. Each of my sinks materializes an ActorRef (for an ActorSubscriber). I do not need to materialize the ActorRefs for anything other than waiting on all of them being terminated so I know that the work is done. In other words, I may not need the materialized values for the sinks if I can somehow know that the processing is done through some other means. – user510159 Mar 31 '17 at 11:42
  • If your materializedValue is an `ActorRef`, it is created synchronously, so the materialization of your flow ensures the creation of the actors. If it is a `Future[ActorRef]`, it gets more complicated. One ugly way to do it would be to chunk your sink list into fewer `AmorphousShape`s (using `GraphDSL.create`), each with several inputs, but only one materialized value which completes when all sinks inside it have completed, and then put those (fewer) shapes into your main `GraphDSL.create`. It is really ugly, but it does the job. – Cyrille Corpet Mar 31 '17 at 12:15
  • Interesting. Is there no other way to combine the sinks' materialized values? Maybe the solution is to do it outside of GraphDSL.create, e.g. create one actor doing termination watch on all the sinks' materialized actors and then using that actor to form a "master" sink which completes when all the actors it watches are terminated? – user510159 Mar 31 '17 at 14:39
  • @CyrilleCorpet: Is there a way to do the same, but instead of closing the shape, turn the outputs into Source that can later on my used as an input to another graph. (source.via(someStage)). Basically what Source.formGraph does, but producing a Seq pf sources – EugeneMi Jun 12 '17 at 18:39
  • @EugeneMi You could create a custom shape with multiple outputs and no input (say, a MultiSourceShape). However, there is no standard way to do this. – Cyrille Corpet Jun 12 '17 at 19:49
  • @CyrilleCorpet: AmorphousShape shape already has multiple outputs. My problem is that I cannot find a way to wire the outputs to anything but a sinks in a RunnableGraph (what you are doing). Instead I would like to find a way to connect the outputs to a source or a flow – EugeneMi Jun 12 '17 at 20:48
  • Whenever you need some more advanced design that linear flows (ie `Source -> Flow -> Sink`, possibly with merged source or broadcasted sinks), you'll need to use `GraphDSL`. Otherwise, it would complicate the API too much without great added value. Personally, if I need a specific shape in different parts of my code, I define it in the same manner that `FlowShape` (for instance) is defined, to avoid using the too generic `AmorphousShape`. But this doesn't change the `GraphDSL` part. – Cyrille Corpet Jun 12 '17 at 21:15