Here's some code I shared with my study group yesterday: https://gist.github.com/natemurthy/019e49e6f5f0d1be8719. After compiling, I run map.scala with the following heap params:
$ scala -J"-Xmx4G" map
and get the following results for 4 separate tests:
// (1L to 20000000L).map(_*2)
(Map) multiplying 20 million elements by 2
(Reduce) sum: 400000020000000
Total MapReduce time: 7.562381
// (1L to 20000000L).toArray.map(_*2)
(Map) multiplying 20 million elements by 2
(Reduce) sum: 400000020000000
Total MapReduce time: 1.233997
// (1L to 20000000L).toVector.map(_*2)
(Map) multiplying 20 million elements by 2
(Reduce) sum: 400000020000000
Total MapReduce time: 15.041896
// (1L to 20000000L).par.map(_*2)
(Map) multiplying 20 million elements by 2
(Reduce) sum: 400000020000000
Total MapReduce time: 18.586220
I'm trying to figure out why these results vary across different collection types, and more importantly, why performance appears to be worse for collections that should intuitively be evaluated faster. Curious to hear your insights into these results. I've also experimented on performing these operations on Breeze and Saddle (which perform much better on the same tests), but I want to see how far I can push the built-in Scala Collections API.
These tests were run on an Asus Zenbook UX31A, Intel Core i7 3517U 1.9 GHz dual core w/hyperthreading, 4 GB RAM, with Ubuntu 12.04 Desktop. Using Scala 2.11.1 with JDK 1.7