0

I have an akka-stream pipeline that fans out events (via BroadcastHub) that are pushed into the stream via a SourceQueueWithComplete.

Despite all downstream consumers having a .buffer() inserted (of which I'd expect that it ensures that upstream buffers of hub and queue remain drained), I still observe backpressure kicking in after the system has been running for a while.

Here's a (simplified) snippet:

class NotificationHub[Event](
  implicit materializer: Materializer,
  ecForLogging: ExecutionContext
) {

  // a SourceQueue to enque events and a BroadcastHub to allow multiple subscribers
  private val (queue, broadCastSource) =
    Source.queue[Event](
      bufferSize = 64,
      // we expect the buffer to never run full and if it does, we want
      // to log that asap, so we use OverflowStrategy.backpressure
      OverflowStrategy.backpressure
    ).toMat(BroadcastHub.sink)(Keep.both).run()

  // This keeps the BroadCastHub drained while there are no subscribers
  // (see https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html ):
  broadCastSource.to(Sink.ignore).run()

  def notificationSource(p: Event => Boolean): Source[Unit, NotUsed] = {
    broadCastSource
      .collect { case event if p(event) => () }
      // this buffer is intended to keep the upstream buffers of
      // queue and hub drained:
      .buffer(
        // if a downstream consumer ever becomes too slow to consume,
        // only the latest two notifications are relevant
        size = 2,
        // doesn't really matter whether we drop head or tail
        // as all elements are the same (), it's just important not
        // to backpressure in case of overflow:
        OverflowStrategy.dropHead
      )
  }

  def propagateEvent(
    event: Event
  ): Unit = {
    queue.offer(event).onComplete {
      case Failure(e) =>
        // unexpected backpressure occurred!
        println(e.getMessage)
        e.printStackTrace()
      case _ =>
        ()
    }
  }

}

Since the doc for buffer() says that for DropHead it never backpressures, I would have expected that the upstream buffers remain drained. Yet still I end up with calls to queue.offer() failing because of backpressure.

Reasons that I could think of:

  1. Evaluation of predicate p in .collect causes a lot of load and hence backpressure. This seems very unlikely because those are very simple non-blocking ops.
  2. Overall system is totally overloaded. Also rather unlikely.

I have a felling I am missing something? Do I maybe need to add an async boundary via .async before or after buffer() to fully decouple the "hub" from possible heavy load that may occur somewhere further downstream?

MartinHH
  • 982
  • 4
  • 7
  • I haven't looked in detail into your problem but Akka Streams' purpose is having backpressure at all times. Upstream and downstream communicate to set the right rate of processing, but all build-in tools in Akka Streams are there to support keeping the backpressure and buffer is not an exception. – Mateusz Kubuszok Jul 22 '22 at 13:49
  • @MateuszKubuszok what you state somehow seems to contradict the scaladoc of various operators where it often states _"Backpressures when: never"_ (e.g. for `Flow.conflate()`) or the one for `Flow.buffer()` where it says that it never backpressures if the `OverflowStrategy` is `DropXYZ` . – MartinHH Jul 22 '22 at 14:09
  • It merely states that this operator will not introduce a slowdown to keep resources constrained (because it might have other means to not overflow them). It doesn't state that the whole stream never slows down to keep things constrained. If whatever is before or after it would need to regulate the speed, the whole stream will adjust, so it will still be backpressured (as required by ReactiveStreams spec). – Mateusz Kubuszok Jul 22 '22 at 14:27
  • If you're correct than the scaladoc would seem very inconsistent to me because there are other operators (like `Flow.map()`) where it states _Backpressures when downstream backpressures_ (which to me seems like the accurate description for the behavior that you described). – MartinHH Jul 23 '22 at 15:13
  • Look into documentation here - https://doc.akka.io/docs/akka/current/stream/operators/Source-or-Flow/conflate.html . It says that in case of slower consumer it combines values from fast producer - which technically is _not_ a backpressure, since it doesn't say the producer to slow down - but the stream as a whole is still working with constrained resources, and with the exception of this stage, is still able to adapt its speed. – Mateusz Kubuszok Jul 24 '22 at 09:13

1 Answers1

0

So after more reading of akka docs and some experiments, I think I found the solution (sorry for maybe asking here too early).

To fully detach my code from any heavy load that may occur somewhere downstream, I need to ensure that any downstream code is not executed by the same actor as the .buffer() (e.g. by inserting .async).

For example, this code would eventually lead to the SourceQueue running full and then backpressuring:

val hub: NotifactionHub[Int] = // ...
hub.notificationSource(_ => true)
  .map { x =>
    Thread.sleep(250)
    x
  }

Further inspections showed that this .map() would be executed on the same thread (of the underlying actor) as the upstream .collect() (and .buffer()).

When inserting .async as shown below, the .buffer() would drop elements (as I had intended it to) and the upstream SourceQueue would remaind drained:

val hub: NotifactionHub[Int] = // ...
hub.notificationSource(_ => true)
  .async
  .map { x =>
    Thread.sleep(250)
    x
  }
MartinHH
  • 982
  • 4
  • 7