6

I've found that my Akka Streams program had unexpected CPU usage.

Here is a simple example:

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}

implicit val system: ActorSystem = ActorSystem.create("QuickStart")
implicit val materializer: ActorMaterializer = ActorMaterializer()

Source.repeat(Unit)
  .to(Sink.ignore)
  .run()

The code piece above will let source and sink runs in the same actor.

It uses about 105% CPU usage on my laptop. Works as expected.

And after I was added an async boundary:

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}

implicit val system: ActorSystem = ActorSystem.create("QuickStart")
implicit val materializer: ActorMaterializer = ActorMaterializer()

Source.repeat(Unit)
  .async // <------ async boundary here
  .to(Sink.ignore)
  .run()

This code piece now will use about 600% of CPU usage on my 4c8t laptop.

I was expecting by adding an async boundary this stream will run in 2 separate actors and will cost a little more than 200% CPU. But it costs a lot more than 200%.

What may causes async boundary to use that much CPU?

lxohi
  • 350
  • 2
  • 11
  • You could try profiling to find out (or just use `jstack ` on the shell. The most likely case is that your thread pool is too big. Try setting `akka.actor.default-dispatcher.fork-join-executor.parallelism-factor = 1`. – jrudolph Oct 24 '18 at 09:01

1 Answers1

3

Default akka.actor.default-dispatcher parameter is Java's ForkJoinPool. It's initialized via call to ThreadPoolConfig.scaledPoolSize. Thus it defaults to starting pool of size (number of processors * 3) and max = parallelism-max (64).

expert
  • 29,290
  • 30
  • 110
  • 214
  • Thank you for your answer! I have tried to change the akka.actor.default-dispatcher.fork-join-executor.parallelism-max, and it was successfully limited the thread count. This works as expected. But the point of my question is what makes Akka streams were able to consume a lot more than 200% CPU? Because in my understanding there will be two actors/threads active after adding the async boundary. So the CPU usage should be about 200% even if there were a lot more threads. – lxohi Oct 29 '18 at 10:12
  • 1
    I thought that default settings allow `.async` to assign threads on each of your 4 cpu cores thus making CPU load > 100%. Why it's not 400% though I don't know. – expert Oct 29 '18 at 14:06
  • 1
    Yes, putting an async boundary will let it scale as much as it can: there's no limit to two threads or two actors. (Since you likely have more than two incoming events, it will scale to more than two actors.) Exactly how many threads in the dispatcher will depend on your parallelism factor is and your hardware. – David Ogren Oct 30 '18 at 15:23