scala Future processing depth-first not breadth-first

Question

I have a large computation roughly based on the following pattern :

def f1(i:Int):Int = ???
def f2(i:Int):Int = ???

def processA(l: List[Int]) = 
  l.map(i => Future(f1(i)))

def processB(l: List[Int]) = {
  val p = processA(l)
  p.map(fut => fut.map(f2))
}

def main() = {
  val items = List( /* 1k to 10k items here */ )
  val results = processB(items)
  results.map(_.onComplete ( ... ))
}

The problem I encounter, if my understanding is correct, is that the processing is breadth-first. ProcessA starts thousands of Futures, and processB will then enqueue thousands of new Futures that will be processed after those of processA are finished. The onComplete callback will start to be fired very late...

I would like to turn this depth-first : few Futures of processA starts, and then, processB continues from there instead of switching to something else in queues.

Can it be done in vanilla scala ? Should I turn to some lib with an alternative to Futures() and ThreadPools ?

EDIT: a bit more detail. Rewriting into f1 andThen f2, as it has been suggested in answers, is not practicable at this time. Actually, processA and B are doing a bunch of other things (incl. side effect). And the fact that processB relies on ProcessA is private. It would break SoC if it's exposed.

EDIT 2: I think I'm going to relax a bit the "vanilla" constraint. Someone suggested Akka streams that would help. I'm currently having a look at scalaz.Task: an opinion anyone ?

You mean that all Futures created in `p.map(fut => fut.map(f2))` will only start after every Future created in `val p = processA(l)` has finished? I don't think that should necessarily be the case. — Jasper-M, Jun 01 '17 at 14:30
Can you put `f1`, `f2` and the `onComplete` parameter in the same synchronous function? — Cyrille Corpet, Jun 01 '17 at 14:30
@CyrilleCorpet No. f1, f2, processA, processB, onComplete are actually quite large code pieces, in different layers of the app. — mathieu, Jun 01 '17 at 14:35
@Jasper-M Not all futures, but that's what I tend to see. With this simplified code, a fast task in f1(), and something heavier in f2(), you can easily see hundreds of f1 executed before any execution of f2() — mathieu, Jun 01 '17 at 15:00
You mean before any execution of `f2` starts? Or finishes? Anyway I'm not really a Futures expert. I think this behaviour largely depends on the `ExecutionContext` implementation that you're using. — Jasper-M, Jun 01 '17 at 15:08

Viktor Klang · Answer 1 · 2017-06-03T14:59:46.553

6

I wasn't 100% sure I understood the question, since processB (f2) runs on top of the results of processA (f1) you cannot call f2 on values which have not been computed by f1 yet, so my answer is based on the assumption that:

You want to limit work-in-progress
You want to execute f2 immediately after f1

So here's one solution to that:

import scala.concurrent._
def process(noAtATime: Int, l: List[Int])(transform: Int => Int)(implicit ec: ExecutionContext): Future[List[Int]] = {
  // define an inner async "loop" to process one chunk of numbers at a time
  def batched(i: Future[Iterator[List[Int]]], result: List[List[Int]]): Future[List[Int]] =
    i flatMap { it =>
      // if there are more chunks to process
      // we process all numbers in the chunk as parallel as possible,
      // then combine the results into a List again, then when all are done,
      // we recurse via flatMap+batched with the iterator
      // when we have no chunks left, then we un-chunk the results
      // reassemble it into the original order and return the result
      if(it.hasNext) Future.traverse(it.next)(n => Future(transform(n))).flatMap(re => batched(i, re :: result))
      else Future.successful(result.reverse.flatten) // Optimize this as needed
    }
  // Start the async "loop" over chunks of input and with an empty result
  batched(Future.successful(l.grouped(noAtATime)), List.empty)
}


scala> def f1(i: Int) = i * 2 // Dummy impl to prove it works
f1: (i: Int)Int

scala> def f2(i: Int) = i + 1 // Dummy impl to prove it works
f2: (i: Int)Int

scala> process(noAtATime = 100, (1 to 10000).toList)(n => f2(f1(n)))(ExecutionContext.global)
res0: scala.concurrent.Future[List[Int]] = Future(<not completed>)

scala> res0.foreach(println)(ExecutionContext.global)

scala> List(3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119 …

If you are willing and able to use a library which is better suited for the problem at hand, have a look at this reply

edited Jun 03 '17 at 14:59

answered Jun 03 '17 at 12:11

Viktor Klang

26,479
7
51
68

This waits for entire chunk to complete before starting the next one. Doesn't really work very well for anything slightly more complicated than doing basic math on ints. – Dima Jun 03 '17 at 12:49
I don't understand how you can conclude that—since it can chunk and unchunk it can most definitely do this arbitrarily nested—with as many chunks as wanted. – Viktor Klang Jun 03 '17 at 14:50
"chunk and unchunk"? I don't know what it means, sorry. You can nest arbitrarily all you want, but the fact is that until the previous chunk is finished, the next one won't start. So, if you are running, on say, 16 cores, and the chunk size is 16, and 15 tasks finished in 10 seconds, and one takes a minute, then you have 15 cores idle for 50 seconds – Dima Jun 03 '17 at 15:03
You can decide how many you want to have running concurrently, that is what it means. I.e. in my example it only runs one chunk after the other, but that is not an inherent limitation as the code is fully nestable. – Viktor Klang Jun 03 '17 at 15:47
The inherent limitation is that there is no way to make N tasks to be executing at any time. It's not about how many chunk you run in parallel, it's about the whole chunk completion being a condition to start another chunk. – Dima Jun 03 '17 at 19:01
I understand what you're saying. However, maximum occupancy was not a listed prerequisite. If that is required, it's fairly straightforward to move the items into a CLQ and proceed from there. Or even going the Akka Streams route. – Viktor Klang Jun 03 '17 at 19:22
I didn't said that it was a prerequisite. Your solution is acceptable. Just inferior. There is no need for quequeing (blocking! :O) or heavy-weight libraries, like akka. A simple semaphore is completely sufficient for a simple scheduling task like this. "Entia non sunt multiplicanda praeter necessitatem", you know ... – Dima Jun 03 '17 at 19:37
Queueing does not necessitate blocking. There are many non-blocking concurrent MPMC queues. In my book a solution which uses Semaphore (blocking!) in a potentially deadlocking way will always be worse--since it is likely to break in production. – Viktor Klang Jun 04 '17 at 06:15
Choosing an inferior solution because you are afraid that the correct one "will break"? Well ... "your book" is wrong. Queuing doesn't necessitate blocking, but your code as written does, when executed on the standard scala runtime. – Dima Jun 04 '17 at 14:23
No. The threads in the thread pool are free to execute other workloads in the mesntime. Stop spreading misinformation, please. – Viktor Klang Jun 04 '17 at 20:47
Furthermore, it's easy to refute the correctness of deadlocking code. – Viktor Klang Jun 04 '17 at 20:48
there are no other workloads. "It's easy to refute correctness" only when the code is not correct, which is not the case here ... that happens to be exactly the reason you haven't actually refute anything, despite talking lengths about how easy it is. :) Refuting the correctness of your code is not as easy (which is always the case with obtuse, unnecessarily complex, solutions), but I have explained its deficiency to you. Just admit it already. It's getting old. – Dima Jun 05 '17 at 10:51
I think anyone who reads this conversation will be more than capable of seeing through your trolling. Bye. – Viktor Klang Jun 05 '17 at 10:54
"Trolling", eh? :D Yeah, I think, everyone can see by now everything that needed to be seen – Dima Jun 05 '17 at 13:28

James Roper · Answer 2 · 2017-06-03T12:50:42.117

1

Your problem is best expressed as a stream. Jobs go into the stream, and are processed, backpressure is used to ensure that only a limited amount of work is done at a time. In Akka streams, it looks like this:

Source(items)
  .mapAsync(4)(f1)
  .mapAsync(4)(f2)
  .<whatever you want to do with the result>

The parallelism will need to be carefully selected to match thread pool sizing, but what this will ensure is that the average rate of times going through f1 will equal the average rate of times going through f2.

edited Jun 03 '17 at 12:50

answered Jun 03 '17 at 12:44

James Roper

12,695
46
45

I gave an answer that just uses Akka streams as an example - you could use any streaming technology to do this. You don't seriously expect people not to use an abstraction built to handle their use case perfectly just because they didn't mention that they wanted to use such an abstraction in their original question? – James Roper Jun 03 '17 at 12:54
depend on whether the "abstraction" is the standard ("vanilla") scala or a heavy add-on library. It's not about what I "expect" so much as what the question actually asks. But, yeah, I would not expect people to seriously consider bringing whole akka into their architecture just to sequence a bunch of futures. – Dima Jun 03 '17 at 13:15

score 0 · Answer 3 · answered Jun 03 '17 at 13:27

0

What is wrong with this?

listOfInts.par.map(f1 andThen f2)

answered Jun 03 '17 at 13:27

oxbow_lakes

133,303
56
317
449

score 0 · Answer 4 · answered Jun 03 '17 at 18:12

0

It is not clear from the question if, from a asynchronous computation point of view, there is a requirement that f1 and f2 are separated into processA and processB - results from f1 are always and only passed to f2, which can be done simpler as part of a single computation "f1 andThen f2", and in a single Future.

If that is the case, then the problem is reduced to "how can a asynchronous computation be run on a potentially large input, limiting the in-flight, spawned Futures":

import scala.concurrent.Future
import java.util.concurrent.Semaphore
import scala.concurrent.ExecutionContext.Implicits.global

val f: (Int) => Int = i => f2(f1(i))

def process(concurrency: Int, input: List[Int], f: Int => Int): Future[List[Int]] = {
  val semaphore = new Semaphore(concurrency)
  Future.traverse(input) { i =>
    semaphore.acquire()
    Future(f(i)).andThen { case _ => semaphore.release() }
  }
}

answered Jun 03 '17 at 18:12

ectrop

26
2

1

I'm afraid this would not only would block threads without using the BlockContext on the given (in this case `global`) ExecutionContext, but it also risks deadlocking the entire thread pool. – Viktor Klang Jun 03 '17 at 18:37
It will block exactly one thread, the one that executes process(). If the requirement is indeed around limiting the number of Futures spawned, there must be a bit of synchronization taking place somewhere "are there already max allowed in-flight async computations going on?" before spawning a new one. I don't see the deadlock though - would appreciate some details on that! – ectrop Jun 03 '17 at 18:48
@ectrop you may block ExecutionContext threads (assuming `process` is also run on same execution context) on semaphore acquiring, before giving a chance to threads from the pool to enter the `andThen` and release the lock. Using blocking may help, but I think there will still be a chance to deadlock. – gilad hoch Jun 03 '17 at 23:24
Semaphore acquiring blocks only one thread - always the same thread - that runs process(). If that is the only thread in the ECs pool, or if all the other threads are blocked, then sure - it's a deadlock. But under a sane configuration, if you have enough (even a single) other threads available to run the f1 & f2 computations, then those threads would also be available to do the semaphore releasing. Really don't see where the deadlock could come from. – ectrop Jun 04 '17 at 06:30
Same execution context is passed implicitly to `Future.traverse`, `Future.apply` and `andThen`. `traverse` will run the body (including semaphore acquiring) on the expense of EC threads. What should happen if all EC threads are assigned to semaphore acquiring before any of the threads has a chance to release the lock? – gilad hoch Jun 04 '17 at 13:10
Re: "what should happen if all EC threads are assigned to semaphore acquiring" - this doesn't happen, verify for yourself. Exactly one thread is assigned to run process(), which performs the semaphore acquiring bit on that thread, not on a subsequently spawned thread / Future.apply computation. – ectrop Jun 04 '17 at 17:08
https://github.com/scala/scala/blob/v2.11.11/src/library/scala/concurrent/Future.scala#L574 the for expression uses `flatMap` to fold over the futures, which takes EC implicitly. I.e: only the first item in the collection will acquire semaphore on main thread. All others will be executed on EC threads. – gilad hoch Jun 05 '17 at 04:44
It is easy to get into situations where many threads in the same EC is calling `process` simultaneously (besides the fact that using the semaphore won't work for EC's with only one thread). Not trying to criticise—rather trying to avoid people getting paged. :) – Viktor Klang Jun 05 '17 at 07:33
@giladhoch there is no for(-comp) expression in the example I've given, nor in the implementation of Future.traverse() that you point to - so not sure what you mean there. Furthermore, the foldLeft in Future.traverse is applied to the `in: M[A]` parameter, which in our case is a List, and definitely does not take an EC as a (implicit) parameter. – ectrop Jul 08 '17 at 21:28
Inside the fold: `for (r <- fr; b <- fb) yield (r += b)` – gilad hoch Jul 10 '17 at 03:10

score -1 · Answer 5 · answered Jun 01 '17 at 16:53

-1

Something like this, perhaps:

items.foreach { it => processB(Seq(it)).onComplete(...) }

Of course, this does not help very much if your f2 is significantly heavier than f1. You'll need some more explicit coordination in that case:

val batchSize = 10
val sem = new Semaphore(batchSize) 
items.foreach { it => 
   sem.acquire
   processB(Seq(It))
    .andThen { case _ => sem.release }
    .onComplete { ... }
}

answered Jun 01 '17 at 16:53

Dima

39,570
6
44
70

I find the semaphore idea very interesting. Remain to integrate this properly with onFailure stuff here and there in the stack. – mathieu Jun 02 '17 at 08:30
1

`.andThen` takes care of failures as well. Shouldn't be anything else needed to "integrate" – Dima Jun 02 '17 at 10:49
No need to introduce needless blocking. Futures perfectly describe dependency in time. – Viktor Klang Jun 03 '17 at 12:22
@ViktorKlang if you have a way to solve it without blocking, do tell. Your solution (1) has blocking too (just because you hide it inside `flatMap` doesn't mean it doesn't happen, and (2) is inferior to this one, albeit more complex: it waits for the completion of the entire previous chunk before starting the next one. That's inefficient in case the execution times of different tasks fluctuate significantly, and leads to bursty loads, and periods of idling resources. – Dima Jun 03 '17 at 12:41
@Dima There is 0 blocking in my proposed solution. And since it deals with chunking and rechunking it can do that arbitrarily nestedly for any number of concurrent chunks. – Viktor Klang Jun 03 '17 at 14:51
Your solution blocks in side `.flatMap` to wait for the future to satisfy. Just because it happens on a thread pool as opposed to the main thread, doesn't mean it won't block. If you are avert to blocking the main thread for some reason, just put `Future { ... }` around my whole answer - problem solved. I am not talking about concurrent chunks. I am talking about the number of tasks, that's being executed at a given time. If the chunk size is 16, and 15 tasks finish in a second, and one takes a minute, than for 59 seconds, there is only one task being executed. – Dima Jun 03 '17 at 15:08
I suspect you're talking about something completely different than [blocking](https://stackoverflow.com/questions/2407589/what-does-the-term-blocking-mean-in-programming). My answer is completely non-blocking, doesn't block any thread, including the calling thread. – Viktor Klang Jun 03 '17 at 15:57
@ViktorKlang no I am talking about blocking. Your solution blocks executor threads that will have to wait for all futures of one chunk to get satisfied before executing the next step. Either way, there's nothing really wrong with blocking (as long as you are not holding an important resource while being blocked). The real problem with your solution is not blocking, but it being bursty, and starving executor threads of work. – Dima Jun 03 '17 at 19:05
@Dima No, that completely depends on thread pool implementation. The code itself does no blocking at all. – Viktor Klang Jun 03 '17 at 19:20
The code "does not do blocking" explicitly. In the same sense as `val x: java.lang.Integer = 1` does not do boxing. – Dima Jun 03 '17 at 19:43
@Dima This is pretty easy to understand, thanks! One question though: if you ran this on a machine with 200 cores, doesn't the fixed batch size limit you to 10 concurrent tasks? How would you deal with this? Make it a config value, or something else? – elo80ka Jun 03 '17 at 22:40
1

I think that if main thread comes from same ExecutionContext passed implicitly to `andThen` you may deadlock (same as @ectrop answer). BTW, @ViktorKlang is one of the authors of scala futures library. (It seems like you're unaware of this) – gilad hoch Jun 03 '17 at 23:34
@elo80ka yes, the `batchSize` should be configurable. It doesn't necessarily need to match the number of cores - if your actual tasks perform any blocking operations, it can be much higher than that. – Dima Jun 04 '17 at 12:43
@giladhoch yes, the default `ExecutorContext` in scala implementation is brain dead, it deadlocks easily, and shouldn't be used for anything serious. There are ways around it, but it is beyond the scope of this answer. Just assume the main thread is NOT from the same pools as what the EC uses. You are correct, I didn't know what @ViktorKlang is author of, but, as you, probably, understand now, I don't really consider it much of a credential. Scala futures lib is really poorly designed and implemented, which is evidenced by the fact, that almost every major framework replaces it with its own. – Dima Jun 04 '17 at 12:46

scala Future processing depth-first not breadth-first

5 Answers5

Linked