1

I have to perform some actions on a 2D List in Scala, and I'm trying to paralelize that task.

Currently I have three Futures, where each takes N lines of the matrix and performs the necessary calculations. It's written like this:

val future1: Future[List[Int]] = Future { makeCalculations(0, 5) }
val future2: Future[List[Int]] = Future { makeCalculations(6, 10) }
val future3: Future[List[Int]] = Future { makeCalculations(11, 15) }

And then I start them at the same time using a for comprehension that yields a list of the returned values.

The thing is, I want this to be dynamic by passing an Int to this function and having it create that exact number of futures.

I tried having a for comprehension to yield the futures but it seems like they are started sequentially and I want them to start at the same time. Am I looking in the wrong place? Is there a better way to do this?

ggfpc
  • 86
  • 8
  • http://viktorklang.com/blog/Futures-in-Scala-protips-2.html – Zernike Jun 11 '17 at 16:41
  • @Zernike Thank you, that really helped, but I still have to explicitly call fx <- future x for each one. Is there a way I can do this without knowing the number of futures beforehand. Like in java where you can create a number of threads inside a loop. – ggfpc Jun 11 '17 at 16:58
  • You could create list of inbound values and then use `Future.traverse`. Look other question https://stackoverflow.com/questions/44309996/scala-future-processing-depth-first-not-breadth-first There is more advanced example with control of parallelization. – Zernike Jun 11 '17 at 17:03

1 Answers1

0

First of all, I think you might be confused about how Futures are scheduled:

And then I start them at the same time using a for comprehension that yields a list of the returned values.

As the matter of fact, Future is scheduled for execution as soon as it is created (apply method is called).

Let me illustrate that with a small code snippet:

  import scala.concurrent.ExecutionContext.Implicits.global
  import scala.concurrent.duration.Duration
  import scala.concurrent.{Await, Future}

  val start = System.currentTimeMillis()

  val fs = (1 to 20).grouped(2).map { x =>
    Future {
      val ts = System.currentTimeMillis() - start
      Thread.sleep(1000)
      (x.head, x.head + x.length - 1, ts)
    }
  }

  val res = Future.sequence(fs)

  Await.result(res, Duration.Inf).foreach(
    println
  )

Here I have a range (1 to 20), which I divide into equal pieces and create Future from each piece. Each future contains it's creation timestamp as well as start and end index within the original range.

Also you might notice that futures have a delay inside, so if they are executed sequentially, we'll see the big difference in start times; on the other hand, if futures are started at the same time in parallel, start timestamp will be almost the same.

Here are the results that I got on my machine (first two numbers are indices, third number is relative start timestamp):

  (1,2,7)
  (3,4,8)
  (5,6,8)
  (7,8,8)
  (9,10,9)
  (11,12,9)
  (13,14,9)
  (15,16,9)
  (17,18,1011)
  (19,20,1011)

As you can see, first 8 futures were started at the same time, while 9th and 10th were delayed for 1 second.

Why did that happen? Because I used scala.concurrent.ExecutionContext.Implicits.global execution context, which by default has parallelism equal to the number of processor cores (8 in my case).


Let's try to supply executor with higher parallelism:

  import java.util.concurrent.Executors
  implicit val ex =
    ExecutionContext.fromExecutor(Executors.newFixedThreadPool(512))
  // same code as before

Result:

  (1,2,8)
  (3,4,8)
  (5,6,16)
  (7,8,13)
  (9,10,13)
  (11,12,14)
  (13,14,14)
  (15,16,14)
  (17,18,15)
  (19,20,16)

As expected, all futures were started approximately at the same time.


And final test, let's have executor with parallelism of one:

  implicit val ex =
    ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))

Result:

  (1,2,4)
  (3,4,1005)
  (5,6,2008)
  (7,8,3009)
  (9,10,4010)
  (11,12,5014)
  (13,14,6016)
  (15,16,7016)
  (17,18,8017)
  (19,20,9019)

Hope that helps to understand when Futures are scheduled for execution and how to control parallelism. Feel free to ask any questions in comments.


UPD: More clear example of how to perform matrix batch processing using Futures.

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}


val m = List(
  Array(1,2,3),
  Array(4,5,6),
  Array(7,8,9),
  Array(1,2,3),
  Array(4,5,6)
)

val n = 3 // desired number of futures
val batchSize = Math.ceil(m.size / n.toDouble).toInt

val fs:Iterator[Future[Int]] = m.grouped(batchSize).map {
  rows:List[Array[Int]] =>
    Future  {
      rows.map(_.sum).sum // any logic for the rows;
                          // sum of the elements
                          // as an illustration
    }
}

// convert Iterator[Future[_]] to Future[Iterator[_]]
val res = Future.sequence(fs)

// print results
Await.result(res, Duration.Inf).foreach(
  println
)

Result:

21 //sum of the elements of the first two rows
30 //... third and fourth row
15 //... single last row
Aivean
  • 10,692
  • 25
  • 39
  • Thank you, that cleared some things up. I think my issue is mostly with syntax. Right now with my for comprehension I can yield a List(f1,f2,f3), how can I do the same thing if I don't have access to each variable? – ggfpc Jun 11 '17 at 22:45
  • @ggfpc It's not entirely clear for me what your problem is and what are you trying to do currently. I tried to give you one variant of how you can dynamically create futures in the loop in my answer (`Future.sequence((1 to 20).map(_ => Future(???)))`). If this is not what you want, please expand you question with specifics (what your "2D list" and your current `for` loop looks like, signature of the function that you want to implement, etc). – Aivean Jun 11 '17 at 22:55
  • I will try the snippet you showed. I have to say I'm a bit confused as well. What I want to do is paralelize a function that performs some calculations on a matrix (such as adding the values of columns adjacent to the current one). My idea was to have N Futures and each Future would have (Size / N) rows of the matrix to work with. In the end I'd have a list of the results. Currently my function receives the index of the first and last row, which is what you see in my original post, and it works. The only thing left to do is to somehow turn those hardcoded Futures into variables. – ggfpc Jun 11 '17 at 23:10
  • @ggfpc, ok, please check the update. I tried to make my example more clear. – Aivean Jun 11 '17 at 23:58