Akka Stream - Select Sink based on Element in Flow

Question

I'm creating a simple message delivery service using Akka stream. The service is just like mail delivery, where elements from source include destination and content like:

case class Message(destination: String, content: String)

and the service should deliver the messages to appropriate sink based on the destination field. I created a DeliverySink class to let it have a name:

case class DeliverySink(name: String, sink: Sink[String, Future[Done]])

Now, I instantiated two DeliverySink, let me call them sinkX and sinkY, and created a map based on their name. In practice, I want to provide a list of sink names and the list should be configurable.

The challenge I'm facing is how to dynamically choose an appropriate sink based on the destination field.

Eventually, I want to map Flow[Message] to a sink. I tried:

val sinkNames: List[String] = List("sinkX", "sinkY")
val sinkMapping: Map[String, DeliverySink] = 
   sinkNames.map { name => name -> DeliverySink(name, ???)}.toMap
Flow[Message].map { msg => msg.content }.to(sinks(msg.destination).sink)

but, obviously this doesn't work because we can't reference msg outside of map...

I guess this is not a right approach. I also thought about using filter with broadcast, but if the destination scales to 100, I cannot type every routing. What is a right way to achieve my goal?

~~[Edit]~~

Ideally, I would like to make destinations dynamic. So, I cannot statically type all destinations in filter or routing logic. If a destination sink has not been connected, it should create a new sink dynamically too.

Ramón J Romero y Vigil · Accepted Answer · 2018-02-13T23:30:23.967

If You Have To Use Multiple Sinks

Sink.combine would directly suite your existing requirements. If you attach an appropriate Flow.filter before each Sink then they'll only receive the appropriate messages.

Don't Use Multiple Sinks

In general I think it is bad design to have the structure, and content, of streams contain business logic. Your stream should be a thin veneer for back-pressured concurrency on top of business logic which is in ordinary scala/java code.

In this particular case, I think it would be best to wrap your destination routing inside of a single Sink and the logic should be implemented inside of a separate function. For example:

val routeMessage : (Message) => Unit = 
  (message) => 
    if(message.destination equalsIgnoreCase "stdout")
      System.out println message.content
    else if(message.destination equalsIgnoreCase "stderr")
      System.err println message.content

val routeSink : Sink[Message, _] = Sink foreach routeMessage

Note how much easier it is to now test my routeMessage since it isn't inside of the stream: I don't need any akka testkit "stuff" to test routeMessage. I can also move the function to a Future or a Thread if my concurrency design were to change.

Many Destinations

If you have many destinations you can use a Map. Suppose, for example, you are sending your messages to AmazonSQS. You could define a function to convert a Queue Name to Queue URL and use that function to maintain a Map of already created names:

type QueueName = String

val nameToRequest : (QueueName) => CreateQueueRequest = ???  //implementation unimportant

type QueueURL = String

val nameToURL : (AmazonSQS) => (QueueName) => QueueURL = {
  val nameToURL = mutable.Map.empty[QueueName, QueueURL]

  (sqs) => (queueName) => nameToURL.get(queueName) match {
    case Some(url) => url
    case None => {
      sqs.createQueue(nameToRequest(queueName))
      val url = sqs.getQueueUrl(queueName).getQueueUrl()

      nameToURL put (queueName, url)

      url
    }
  }
}

Now you can use this non-stream function inside of a singular Sink:

val sendMessage : (AmazonSQS) => (Message) => Unit = 
  (sqs) => (message) => 
    sqs sendMessage {
      (new SendMessageRequest())
        .withQueueUrl(nameToURL(sqs)(message.destination))
        .withMessageBody(message.content)
    }

val sqs : AmazonSQS = ???

val messageSink = Sink foreach sendMessage(sqs)

Side Note

For destination you probably want to use something other than String. A coproduct is usually better because they can be used with case statements and you'll get helpful compiler errors if you miss one of the possibilities:

sealed trait Destination

object Out extends Destination
object Err extends Destination
object SomethingElse extends Destination

case class Message(destination: Destination, content: String)

//This function won't compile because SomethingElse doesn't have a case
val routeMessage : (Message) => Unit = 
  (message) => message.destination match {
    case Out =>
      System.out.println(message.content)
    case Err =>
      System.err.println(message.content)
  }

Thanks for your answer. I added a new requirement. Let me know if you have any suggestions. — gyoho, Feb 13 '18 at 15:55
@gyoho If that is truly your requirement then I would suggestion re-engineering the entire design. I've never seen a practical use case of streams where the set of Sinks is unknown at materialization time. — Ramón J Romero y Vigil, Feb 13 '18 at 17:04
Let's say the destination is a queue, and the `destination` field specifies a queue name. We do have a list of all the destination queue names in advance, so we can materialize all the queue sinks. However, if the list has 100 names, I have 100 case matches. Is there a better way to do this? — gyoho, Feb 13 '18 at 18:19
@gyoho But why would you need a separate sink for each Queue??? You could do something like `Sink.fold` to create a single Sink and then collect the message values into a single `Map` of `Destination -> Queue`. This Map would be accessible via a `Future[Map[String, Queue[Message]]` once the stream is complete. — Ramón J Romero y Vigil, Feb 13 '18 at 21:43
If I use [SQS connector from Alpakka](https://developer.lightbend.com/docs/alpakka/current/sqs.html), I need to provide a queueUrl to create Source/Sink. Maybe, I'm missing something... I would really appreciate it if you could give me some example or direct me to a resource. — gyoho, Feb 13 '18 at 21:54
@gyoho I think alpakka is designed for very simple use cases. So using multiple Sinks to conform to alpakka isn't the best option. I've updated my answer to demonstrate using AmazonSQS sdk using the pattern I previously described of using a single Sink. — Ramón J Romero y Vigil, Feb 13 '18 at 23:32
A disadvantage to using `Sink.combine` with `Flow.filter` is that the filter must be repeated for each sink. This could turn an `O(log(n))` map operation into an `O(n*log(n))` filter operation. — Owen, Feb 13 '18 at 23:55
@Owen Agreed. One of the many reason I suggest to not use multiple Sinks in the first place. — Ramón J Romero y Vigil, Feb 14 '18 at 00:17
@gyoho Same general idea, the structure of your streams should not be "business logic". — Ramón J Romero y Vigil, Feb 14 '18 at 19:22

Leo C · Answer 2 · 2018-02-14T01:11:30.410

Given your requirement, maybe you want to consider multiplexing your stream source into substreams using groubBy:

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import akka.util.ByteString
import akka.{NotUsed, Done}
import akka.stream.IOResult
import scala.concurrent.Future
import java.nio.file.Paths
import java.nio.file.StandardOpenOption._

implicit val system = ActorSystem("sys")
implicit val materializer = ActorMaterializer()
import system.dispatcher

case class Message(destination: String, content: String)
case class DeliverySink(name: String, sink: Sink[ByteString, Future[IOResult]])

val messageSource: Source[Message, NotUsed] = Source(List(
  Message("a", "uuu"), Message("a", "vvv"),
  Message("b", "xxx"), Message("b", "yyy"), Message("b", "zzz")
))

val sinkA = DeliverySink("sink-a", FileIO.toPath(
  Paths.get("/path/to/sink-a.txt"), options = Set(CREATE, WRITE)
))
val sinkB = DeliverySink("sink-b", FileIO.toPath(
  Paths.get("/path/to/sink-b.txt"), options = Set(CREATE, WRITE)
))

val sinkMapping: Map[String, DeliverySink] = Map("a" -> sinkA, "b" -> sinkB)

val totalDests = 2

messageSource.map(m => (m.destination, m)).
  groupBy(totalDests, _._1).
  fold(("", List.empty[Message])) {
    case ((_, list), (dest, msg)) => (dest, msg :: list)
  }.
  mapAsync(parallelism = totalDests) {
    case (dest: String, msgList: List[Message]) =>
      Source(msgList.reverse).map(_.content).map(ByteString(_)).
        runWith(sinkMapping(dest).sink)
  }.
  mergeSubstreams.
  runWith(Sink.ignore)

The case match should be: `{ msg => Source.single(msg).map(_.payload).runWith(sinkMapping(msg.destination).sink)}`. But, I think groupBy is what I was looking for! The only concern I have is how much overhead it could have using `Source.single(msg)` — gyoho, Feb 13 '18 at 21:42
After applying `groupBy(totalDests, _.destination)`, you should have tuples of (dest, List[Message]), hence `sinkMapping(dest)` is no difference from `sinkMapping(msg.destination)`. In any case, my goal was to outline the use of `groupBy` with `mapAsync` and you're free to refine it to your actual stream data structure. — Leo C, Feb 13 '18 at 22:30
The messageSource type is `Source[Message, NotUsed]`. I don't think I'll get `(dest, List[Message])`. I got this compilation error - Error:(42, 16) constructor cannot be instantiated to expected type; found : (T1, T2) required: Message case (dest, msgList) => Could you tell me how I can get tuples of (dest, List[Message])? — gyoho, Feb 13 '18 at 22:41
Surprisingly, `groupBy` doesn't quite work with case class. I've replaced the example with a testable one by modifying the Sink type to `Sink[ByteString, Future[IOResult]]` and mapping case classes to tuples. In this example, two files will be created with grouped message content in accordance with the Message's destination. — Leo C, Feb 14 '18 at 01:08

Akka Stream - Select Sink based on Element in Flow

2 Answers2

Linked