2

I have a collection on which I call .par, like this:

myCollection.par.map(element => longRunningOperation(element)).seq
println("after map")

Will calling .seq guarantee all threads are joined before continuing, and all maps completed, before calling println?

axel22
  • 32,045
  • 9
  • 125
  • 137
Geo
  • 93,257
  • 117
  • 344
  • 520
  • 2
    Isn't this condition is already satisfied when `map` has completed, anyway? – Jean-Philippe Pellet Aug 19 '11 at 10:54
  • Um, I don't know, but I don't think so. I think that once you call `.par`, you'd only deal with parallel collections until you make them into sequential ones. If I don't add the `seq`, the interpreter continues executing the code after the `par` line. – Geo Aug 19 '11 at 10:57
  • 2
    @Geo You are confusing parallelism with futures. – Daniel C. Sobral Aug 19 '11 at 12:17

2 Answers2

8

The worker threads are launched once the map operation is invoked. They are all joined by the framework before the map operation completes. By the time you call seq there are no more running worker threads.

axel22
  • 32,045
  • 9
  • 125
  • 137
4

Yes, it will. Indeed, you don't need to call .seq at the end.

The easy way to answer questions like this is to remember that, in the absence of side effects, parallel collections have exactly the same semantics as non-parallel collections. Unless the code in your longRunningOperation has visible side effects, the only way you'll be able to tell that the code is being run in parallel is checking processor utilization.

Dave Griffith
  • 20,435
  • 3
  • 55
  • 76
  • What is considered `side effects` in parallel collection context? – Geo Aug 19 '11 at 12:50
  • Same as anywhere else: anything which modifies a object's state or performs I/O. Actually, there is an oddity in your example, that you are throwing away the values returned by your longRunningOperation. If you are intending to execute those for their side effects, you should be using .foreach, rather than .map. In either case, execution of the parallel collection operation will run to completion before your call to println. – Dave Griffith Aug 19 '11 at 13:42
  • Ah, I only pasted part of it. I'm actually keeping the results. Also, I'm performing IO in my long running operation. Should I handle something differently? – Geo Aug 19 '11 at 14:04
  • Depends. If your don't mind that the I/O you perform in your long-running operations could occur in any order, then you're fine. If that would be catastrophic, then you probably shouldn't be using parallel collections – Dave Griffith Aug 19 '11 at 14:14
  • I'm doing IO on separate files. Only after all threads finish, I'm aggregating the results. This is why it's important that the worker threads finished. – Geo Aug 19 '11 at 14:58
  • Doing that is perfectly fine. – soc Aug 19 '11 at 15:35