7

i'm trying to usage parallel collections in a very basic way via .par - i expect the collection to be acted on out of order, but that doesn't seem the case:

scala> (1 to 10) map println
1
2
3
4
5
6
7
8
9
10

and

scala> (1 to 10).par map println
1
2
3
4
5
6
7
8
9
10

seems like the order shouldn't be sequential in the latter case. this is with scala 2.9, my machine has 2 cores. is this perhaps a misconfiguration somewhere? thanks!

edit: i did indeed try running with a large set (100k) and the result was still sequential.

Heinrich Schmetterling
  • 6,614
  • 11
  • 40
  • 56
  • I don't know any Scala, but perhaps you can use a bigger range to ensure multiple processor usage? Write a little Ruby script to generate 1 to 10,000,000 and try parallelizing again in Scala. Then compare the two. See what happens. – Mike Jun 03 '11 at 01:44
  • Try with `(1 to 100)` see if that changes your results. – huynhjl Jun 03 '11 at 01:54
  • @huynhjl @mike i tried a much larger set (100k) and it was sequential every time. – Heinrich Schmetterling Jun 03 '11 at 03:08
  • Try `(1 to 10).par map { i => Thread.sleep(1000); println(i) }` – huynhjl Jun 03 '11 at 05:16
  • 1
    Also what does `Runtime.getRuntime.availableProcessors` return? – huynhjl Jun 03 '11 at 05:44
  • availableProcessors returns 2. if i do (1 to 10).par map { i => Thread.sleep(1000); println(i) }, it gives me "1 6 2 7 3 8 4 9 5 10". – Heinrich Schmetterling Jun 04 '11 at 09:14
  • is it possible that in the non-sleep case it's only using one thread, but if it's sleeping, it uses two? – Heinrich Schmetterling Jun 04 '11 at 09:27
  • The `sleep(1000)` proves that there is nothing inherent in your setup that prevent concurrent processing. I added a link to `ForkJoinPool` in my answer which explains how tasks are scheduled. It is surprising that you don't see // processing on (1 to 100000), but I'm not sure how to troubleshoot further. – huynhjl Jun 04 '11 at 16:08

2 Answers2

11

YMMV:

scala> (1 to 10).par map println
1
6
2
3
4
7
5
8
9

This is on a dual core too...

I think if you try enough run you may see different results. Here is a piece of code that shows some of what happens:

import collection.parallel._
import collection.parallel.immutable._

class ParRangeEx(range: Range) extends ParRange(range) {
  // Some minimal number of elements after which this collection 
  // should be handled sequentially by different processors.
  override def threshold(sz: Int, p:Int) = {
    val res = super.threshold(sz, p)
    printf("threshold(%d, %d) returned %d\n", sz, p, res)
    res
  }
  override def splitter = {
    new ParRangeIterator(range) 
        with SignalContextPassingIterator[ParRangeIterator] {
      override def split: Seq[ParRangeIterator] = {
        val res = super.split
        println("split " + res) // probably doesn't show further splits
        res
      }
    }
  }
}

new ParRangeEx((1 to 10)).par map println

Some runs I get interspersed processing, some runs I get sequential processing. It seems to split the load in two. If you change the returned threshold number to 11, you'll see that the workload will never be split.

The underlying scheduling mechanism is based on fork-join and work stealing. See the following JSR166 source code for some insights. This is probably what drives whether the same thread will pick up both tasks (and thus seems sequential) or two threads work on each task.

Here is an example output on my computer:

threshold(10, 2) returned 1
split List(ParRangeIterator(over: Range(1, 2, 3, 4, 5)), 
  ParRangeIterator(over: Range(6, 7, 8, 9, 10)))
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
6
7
threshold(10, 2) returned 1
8
1
9
2
10
3
4
5
huynhjl
  • 41,520
  • 14
  • 105
  • 158
1

The answer can very well come out sequentially; there just is no guarantee for it. On such a small set you would normally get it sequentially. The println however invokes a system call, if you run it enough times you probably will get a jumbled version.

thoredge
  • 12,237
  • 1
  • 40
  • 55
  • strange, that hasn't been the case, i've run it a dozen times and tried a much larger set, the result was still sequential. – Heinrich Schmetterling Jun 03 '11 at 03:07
  • @HeinrichSchmetterling from the operating system's point of view, each write to stdout is an atomic unit as long as it's < PAGE_SIZE (typically 4kB), so it won't be jumbled. that's a posix guarantee. – user239558 May 14 '18 at 13:04