3

I'm trying out some parallel programming with Scala and Akka, which I'm new to. I've got a pretty simple Monte Carlo Pi application (approximates pi in a circle) which I've built in several languages. However the performance of the version I've built in Akka is puzzling me.

I have a sequential version written in pure Scala that tends to take roughly 400ms to complete.

In comparison with 1 worker actor the Akka version takes around 300-350ms, however as I increase the number of actors that time increases dramatically. With 4 actors the time can be anywhere between 500ms all the way up to 1200ms or higher.

The number of iterations are being divided up between the worker actors, so ideally performance should be getting better the more of them there are, currently it's getting significantly worse.

My code is

object MCpi{
  //Declare initial values
  val numWorkers = 2
  val numIterations = 10000000

  //Declare messages that will be sent to actors
  sealed trait PiMessage
  case object Calculate extends PiMessage
  case class Work(iterations: Int) extends PiMessage
  case class Result(value: Int) extends PiMessage
  case class PiApprox(pi: Double, duration: Double)

  //Main method
  def main(args: Array[String]): Unit = {
    val system = ActorSystem("MCpi_System") //Create Akka system
    val master = system.actorOf(Props(new MCpi_Master(numWorkers, numIterations))) //Create Master Actor
    println("Starting Master")

    master ! Calculate //Run calculation
  }
}

//Master
class MCpi_Master(numWorkers: Int, numIterations: Int) extends Actor{

  var pi: Double = _ // Store pi
  var quadSum: Int = _ //the total number of points inside the quadrant
  var numResults: Int = _ //number of results returned
  val startTime: Double = System.currentTimeMillis() //calculation start time

  //Create a group of worker actors
  val workerRouter = context.actorOf(
    Props[MCpi_Worker].withRouter(RoundRobinPool(numWorkers)), name = "workerRouter")
  val listener = context.actorOf(Props[MCpi_Listener], name = "listener")

  def receive = {
    //Tell workers to start the calculation
      //For each worker a message is sent with the number of iterations it is to perform,
      //iterations are split up between the number of workers.
    case Calculate => for(i <- 0 until numWorkers) workerRouter ! Work(numIterations / numWorkers);

      //Receive the results from the workers
        case Result(value) =>
            //Add up the total number of points in the circle from each worker
      quadSum += value
            //Total up the number of results which have been received, this should be 1 for each worker
      numResults += 1

      if(numResults == numWorkers) { //Once all results have been collected
          //Calculate pi
          pi = (4.0 * quadSum) / numIterations
          //Send the results to the listener to output
        listener ! PiApprox(pi, duration = System.currentTimeMillis - startTime)
        context.stop(self)
      }
  }
}
//Worker
class MCpi_Worker extends Actor {
  //Performs the calculation
  def calculatePi(iterations: Int): Int = {

    val r = scala.util.Random // Create random number generator
    var inQuadrant: Int = 0 //Store number of points within circle

    for(i <- 0 to iterations){
      //Generate random point
      val X = r.nextFloat()
      val Y = r.nextFloat()

      //Determine whether or not the point is within the circle
      if(((X * X) + (Y * Y)) < 1.0)
        inQuadrant += 1
    }
    inQuadrant //return the number of points within the circle
  }

  def receive = {
    //Starts the calculation then returns the result
    case Work(iterations) => sender ! Result(calculatePi(iterations))
  }
}

//Listener
class MCpi_Listener extends Actor{ //Recieves and prints the final result
  def receive = {
    case PiApprox(pi, duration) =>
        //Print the results
      println("\n\tPi approximation: \t\t%s\n\tCalculation time: \t%s".format(pi, duration))
        //Print to a CSV file
        val pw: FileWriter = new FileWriter("../../../..//Results/Scala_Results.csv", true)
        pw.append(duration.toString())
        pw.append("\n")
        pw.close()
      context.system.terminate()

  }
}

The plain Scala sequential version is

object MCpi {
    def main(args: Array[String]): Unit = {
        //Define the number of iterations to perform
        val iterations = args(0).toInt;
        val resultsPath = args(1);

        //Get the current time
        val start = System.currentTimeMillis()


        // Create random number generator
        val r = scala.util.Random
        //Store number of points within circle
        var inQuadrant: Int = 0

        for(i <- 0 to iterations){
            //Generate random point
            val X = r.nextFloat()
            val Y = r.nextFloat()

            //Determine whether or not the point is within the circle
            if(((X * X) + (Y * Y)) < 1.0)
                inQuadrant += 1
        }
        //Calculate pi
        val pi = (4.0 * inQuadrant) / iterations
        //Get the total time
        val time = System.currentTimeMillis() - start
        //Output values
        println("Number of Iterations: " + iterations)
        println("Pi has been calculated as: " + pi)
        println("Total time taken: " + time + " (Milliseconds)")

        //Print to a CSV file
        val pw: FileWriter = new FileWriter(resultsPath + "/Scala_Results.csv", true)
        pw.append(time.toString())
        pw.append("\n")
        pw.close()
    }
}

Any suggestions as to why this is happening or how I can improve performance would be very welcome.

Edit: I'd like to thank all of you for your answers, this is my first question on this site and all the answers are extremely helpful, I have plenty to look in to now :)

Cipher478
  • 33
  • 1
  • 4
  • In this case some information about the kind of processor(s) you are running this on is probably helpful. – Jasper-M Jan 19 '17 at 15:05
  • 1) Please post your code on SO. Format it before posting. 2) What do you even expect from actor implementation when you're executing `calculatePi` method multiple times which is from what I can see, an equivalent to your sequential implementation? And from what I see, you're just calculating PI multiple times (number of calculations is equivalent to the number of worker actors which is probably the explanation for slowdown)? Correct me if I'm wrong. 3) Did you consider that you might not gain anything by using an actor model in this case? – Branislav Lazic Jan 19 '17 at 15:07
  • @Jasper-M Processor is an Intel i7-4510U quad core @ 3.1GHz @Branislav 1) Okay, I'll try update the post with the code when I'm free later. 2) `calculatePi` is run by each worker, it generates many random points and measures whether those points are within a "circle" of a particular size (in this case 1.0), then returns how many points were in the circle (quadSum), once the results are back from each worker the calculation is done once, to work out what Pi is (in the master actor). 3) I assumed that I'd get some sort of a performance increase splitting the work over multiple actors. – Cipher478 Jan 19 '17 at 15:25
  • This is a CPU bound task and actors share a thread pool so adding more actors without configuring the pool to host more threads will decrease performance. Keep in mind that the actor pattern is a tool for concurrent communication not for parallel computing – Mustafa Simav Jan 20 '17 at 10:49
  • @MustafaSimav still 2 actors instead of 1 on a quadcore are likely to show a speed-up. actors can be used for parallel computing *as well as* concurrent communication. – Stefano Bonetti Jan 20 '17 at 11:53

3 Answers3

8

You have a synchronisation issue around the Random instance you're using.

More specifically, this line

val r = scala.util.Random // Create random number generator

actually doesn't "create a random number generator", but picks up the singleton object that scala.util conveniently offers you. This means that all threads will share it, and will synchronise around its seed (see the code of java.util.Random.nextFloat for more info).

Simply by changing that line to

val r = new scala.util.Random // Create random number generator

you should get some parallelisation speed-up. As stated in the comments, the speed-up will depend on your architecture, etc. etc., but at least it will not be so badly biased by strong synchronisation.

Note that java.util will use System.nanoTime as seed of a newly created Random, so you should need not worry about randomisation issues.

Stefano Bonetti
  • 8,973
  • 1
  • 25
  • 44
4

I think it's a great question worth digging into. Using Akka Actor system that does come with some systems overhead, I expect performance gain will be seen only when the scale is large enough. I test-ran your two versions (non-akka vs akka) with minimal code change. At 1 million or 10 million hits, as expected there is hardly any performance difference regardless of Akka vs non-Akka or number of workers used. But at 100 million hits, you can see consistent performance difference.

Besides scaling up the total hits to 100 million, the only code change I made was replacing scala.util.Random with java.util.concurrent.ThreadLocalRandom:

//val r = scala.util.Random // Create random number generator
def r = ThreadLocalRandom.current
...
  //Generate random point
  //val X = r.nextFloat()
  //val Y = r.nextFloat()
  val X = r.nextDouble(0.0, 1.0)
  val Y = r.nextDouble(0.0, 1.0)

This was all done on an old MacBook Pro with a 2GHz quadcore CPU and 8GB of memory. Here are the test-run results at 100 million total hits:

  • Non-Akka app takes ~1720 ms
  • Akka app with 2 workers takes ~770 ms
  • Akka app with 4 workers takes ~430 ms

Individual test-runs below ...

Non-Akka

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Number of Iterations: 100000000 Pi has been calculated as: 3.1415916 Total time taken: 1722 (Milliseconds) [success] Total time: 2 s, completed Jan 20, 2017 3:26:20 PM

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Number of Iterations: 100000000 Pi has been calculated as: 3.14159724 Total time taken: 1715 (Milliseconds) [success] Total time: 2 s, completed Jan 20, 2017 3:28:17 PM

Using Akka

Number of Workers = 4:

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Starting Master

Pi approximation:       3.14110116
Calculation time:   423.0

[success] Total time: 1 s, completed Jan 20, 2017 3:35:25 PM

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Starting Master

Pi approximation:       3.14181316
Calculation time:   440.0

[success] Total time: 1 s, completed Jan 20, 2017 3:35:34 PM

Number of Workers = 2:

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Starting Master

Pi approximation:       3.14162344
Calculation time:   766.0

[success] Total time: 2 s, completed Jan 20, 2017 3:36:34 PM

$ sbt "runMain calcpi.MCpi 100000000 /tmp"

[info] Loading project definition from /Users/leo/projects/scala/test/akka-calculate-pi/project [info] Set current project to Akka Pi Calculation (in build file:/Users/leo/projects/scala/test/akka-calculate-pi/) [info] Running calcpi.MCpi 100000000 /tmp Starting Master

Pi approximation:       3.14182148
Calculation time:   787.0

[success] Total time: 2 s, completed Jan 20, 2017 3:36:43 PM

Leo C
  • 22,006
  • 3
  • 26
  • 39
  • You were completely right, the issue was the random number generator I was using. There's also a bigger speed difference with a larger scale. Thank you very much for this :) – Cipher478 Jan 21 '17 at 15:11
-4

I think that your issue is caused by execution of heavy calculations in the body of receive function, it may be the case that some of them run on the one thread so you are just adding aktor system weight to your standard one threaded computation, thus making it slower. From akka documentation:

Behind the scenes Akka will run sets of actors on sets of real threads, where typically many actors share one thread, and subsequent invocations of one actor may end up being processed on different threads. Akka ensures that this implementation detail does not affect the single-threadedness of handling the actor’s state.

I am not sure if it is the case but you may try running your computation in future:

Future {
  //your code
}

To make it work you need to provide implicit execution context, you can do this in many ways, but two are the easiest:

  1. Import global execution context

  2. Import execution context of the actor:

    import context.dispatcher

The second one has to be used insied your actor class body.

L.Lampart
  • 755
  • 5
  • 10