First of all, I think you might be confused about how Futures are scheduled:
And then I start them at the same time using a for comprehension that yields a list of the returned values.
As the matter of fact, Future
is scheduled for execution as soon as it is created (apply
method is called).
Let me illustrate that with a small code snippet:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
val start = System.currentTimeMillis()
val fs = (1 to 20).grouped(2).map { x =>
Future {
val ts = System.currentTimeMillis() - start
Thread.sleep(1000)
(x.head, x.head + x.length - 1, ts)
}
}
val res = Future.sequence(fs)
Await.result(res, Duration.Inf).foreach(
println
)
Here I have a range (1 to 20)
, which I divide into equal pieces and create Future
from each piece. Each future contains it's creation timestamp as well as start and end index within the original range.
Also you might notice that futures have a delay inside, so if they are executed sequentially, we'll see the big difference in start times; on the other hand, if futures are started at the same time in parallel, start timestamp will be almost the same.
Here are the results that I got on my machine (first two numbers are indices, third number is relative start timestamp):
(1,2,7)
(3,4,8)
(5,6,8)
(7,8,8)
(9,10,9)
(11,12,9)
(13,14,9)
(15,16,9)
(17,18,1011)
(19,20,1011)
As you can see, first 8 futures were started at the same time, while 9th and 10th were delayed for 1 second.
Why did that happen? Because I used scala.concurrent.ExecutionContext.Implicits.global
execution context, which by default has parallelism equal to the number of processor cores (8 in my case).
Let's try to supply executor with higher parallelism:
import java.util.concurrent.Executors
implicit val ex =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(512))
// same code as before
Result:
(1,2,8)
(3,4,8)
(5,6,16)
(7,8,13)
(9,10,13)
(11,12,14)
(13,14,14)
(15,16,14)
(17,18,15)
(19,20,16)
As expected, all futures were started approximately at the same time.
And final test, let's have executor with parallelism of one:
implicit val ex =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
Result:
(1,2,4)
(3,4,1005)
(5,6,2008)
(7,8,3009)
(9,10,4010)
(11,12,5014)
(13,14,6016)
(15,16,7016)
(17,18,8017)
(19,20,9019)
Hope that helps to understand when Futures are scheduled for execution and how to control parallelism. Feel free to ask any questions in comments.
UPD: More clear example of how to perform matrix batch processing using Futures.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
val m = List(
Array(1,2,3),
Array(4,5,6),
Array(7,8,9),
Array(1,2,3),
Array(4,5,6)
)
val n = 3 // desired number of futures
val batchSize = Math.ceil(m.size / n.toDouble).toInt
val fs:Iterator[Future[Int]] = m.grouped(batchSize).map {
rows:List[Array[Int]] =>
Future {
rows.map(_.sum).sum // any logic for the rows;
// sum of the elements
// as an illustration
}
}
// convert Iterator[Future[_]] to Future[Iterator[_]]
val res = Future.sequence(fs)
// print results
Await.result(res, Duration.Inf).foreach(
println
)
Result:
21 //sum of the elements of the first two rows
30 //... third and fourth row
15 //... single last row