I am as idly curious about meaningless microbenchmarks as the next hacker, so here is a demonstration of why the result is meaningful, why it matters where you put the par
and why the OP's conjecture was correct (if the methodology was flawed):
scala> import System.nanoTime
import System.nanoTime
scala> def timed(op: =>Unit) = { val t0=nanoTime;op;println(nanoTime-t0) }
timed: (op: => Unit)Unit
scala> val data = (1 to 1000000).toList
data: List[Int] = List(1, 2, 3, 4,...
scala> timed(data.par)
85333715
scala> timed(data.par)
40952638
scala> timed(data.par)
40134628
On my machine, constructing a small 10k list takes the same time as calling par
on it, around 400k nanos, which is why, in the green checked answer, .toList.par
rounds up to one and .toList
rounds down to zero.
OTOH, constructing a large 1m list sequentially is more variable.
scala> 1 to 100 foreach (_ => timed((1 to 1000000).toList))
loses a factor of ten somewhere. I haven't looked to see whether that is due to reallocations, garbage collection, memory architecture or what.
But it's interesting how easily this works:
scala> 1 to 100 foreach (_ => timed((1 to 1000000).par.to[ParVector]))
The ParRange
edges out the sequential Range
in this test and is faster than data.par
. (On my machine.)
What's interesting to me is that there is no computation to parallelize here.
This must mean that it's inexpensive to assemble a ParVector
in parallel. Compare this other answer where the costs of assembly in a parallel groupBy
were surprising to me as a ParNewbie
.