0

Suppose I have an array of ~10K elements and I need to process all elements of the array. I would like to process them in such a way that only K elements are processed in parallel.

I use Scala 2.9. I tried parallel collections (see below) but I saw more than K elements processed in parallel.

import collection.parallel.ForkJoinTasks.defaultForkJoinPool._
val old = getParallelism
setParallelism(K)
val result = myArray.par.map(...) // process the array in parallel
setParallelism(old)

How would you suggest process an array in Scala 2.9 in such a way that only K elements are processed in parallel ?

Michael
  • 41,026
  • 70
  • 193
  • 341

1 Answers1

1

The setParallelism method sets the recommended number of parallel workers that fork/join pool of the parallel collection is supposed to use. Those K workers may work on any part of the collection - it is up to the scheduler to decide which elements the workers will be assigned to.

If you would like to include only first K elements in the parallel operation, you should use the take method, followed by a map:

myArray.par.take(K).map(...)

You can alternatively use view.take(K).map(...).force to create a parallel view before doing the mapping.

axel22
  • 32,045
  • 9
  • 125
  • 137
  • Thanks. Could you please elaborate a bit on using `view` ? – Michael Jun 26 '13 at 15:11
  • 1
    The idea behind views is to postpone evaluating the intermediate collections that would otherwise be created in memory using `take`, `map` or `filter` until the collection is `force`d. Due to certain abstraction penalties and indirections, this may or may not increase performance, depending on what the transformations are. You can read more about it here: http://docs.scala-lang.org/overviews/collections/views.html – axel22 Jun 26 '13 at 15:33