12

I've recently been learning about various libraries for concurrency in Java such as ConcurrentHashMap and the lovely non blocking one from Cliff Click

I don't know much about Scala but I've heard good things about the recent parallel collections library.

I'd like to know what are some major advantages that this library would give over Java based libraries?

Jed Wesley-Smith
  • 4,686
  • 18
  • 20
barrymac
  • 2,750
  • 1
  • 20
  • 32

3 Answers3

26

The two collections are for different uses.

Java's concurrent collections allow you to use them from a parallel context: many threads can access them simultaneously, and the collection will be sure to do the right thing (so the callers don't have to worry about locks and such).

Scala's parallel collections, in contrast, are designed to run high-order operations on themselves without you having to worry about creating threads. So you can write something like:

myData.par.filter(_.expensiveTest()).map(_.expensiveComputation())

and the filter and map will each execute in parallel (but the filter will complete before the map starts).

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • Could I conclude that the Scala collections library is more of a parallelisation DSL like Groovy's GPars rather than a collections API per say? – barrymac Jun 02 '11 at 12:50
  • 2
    @barrymac - No, you could not conclude that. It's a full library with all your typical folds and maps and filters that you expect from a functional language. The parallel collections are only part of the library; they operate in parallel whenever possible, but completely transparently. There's no "DSL" because the syntax for most operations is exactly the same as for any other collection (to the point where you could, say, map on a superclass (e.g. `GenSeq`) and not know or care whether the map happens serially or in parallel. – Rex Kerr Jun 02 '11 at 12:57
  • 1
    Excellent stuff. This is one of the most compelling things I have heard for getting more into scala! – barrymac Jun 02 '11 at 13:16
7

To extend Rex' answer a little bit: The reason why Java style concurrent modifiable collections are not very interesting in Scala is its bias for immuatable data: The most common way to implement concurrency in Scala is the actor model (which relies on immutable data), not threads.

Landei
  • 54,104
  • 13
  • 100
  • 195
  • I don't really agree--I use the Java collections all the time even with actors; I try to minimize the amount of shared state that must be accessed this way, but if a process needs to be globally synchronized, it's both easier and lower-overhead to use a synchronous collection than a whole extra actor who is only keeping track of global state. – Rex Kerr Jun 01 '11 at 18:39
  • 1
    I don't say synchronous collections are not convenient in Scala, but I think they don't play the central role in concurrent programming as they do in Java, where the only other built-in choice you have is to synchronize your code yourself. – Landei Jun 01 '11 at 19:02
  • I agree they're not as central; I just think they're still interesting. If they were gone, I would miss them sorely, but only a small fraction of my code would be impacted. – Rex Kerr Jun 01 '11 at 19:12
5

In addition to Rex Kerr's answer above about concurrent and parallel collections serving two different purposes, I would add that Java actually has a parallel array implementation by Doug Lea in the extra JSR 166 package - this collection allows bulk operations being performed on the array elements, while not being suitable for concurrent access without explicit synchronization. One big difference here is that Scala parallel collections have parallel implementations for other collections as well, and not just arrays. These are:

  • ParVector
  • ParRange
  • mutable.ParHashMap
  • mutable.ParHashSet
  • immutable.ParHashMap
  • immutable.ParHashSet

All of the sequential variants of these collections can be directly converted into their parallel counterparts (method par). Other sequential collections can be converted into some of the above collections in linear time with respect to collection size.

Some additional data structures are on the way for the future releases, including some parallel collections which will also allow concurrent access.

axel22
  • 32,045
  • 9
  • 125
  • 137
  • It would be nice if you mentioned what the Java parallel array implementation was. – Rex Kerr Jun 01 '11 at 22:05
  • 5
    The link to the parallel array itself is http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/ and is probably terrifying enough to convince people that you want to do it the Scala way, not the Java way. – Rex Kerr Jun 01 '11 at 22:32
  • I'd add that there's even more magic here than at first appear. If you have a `for/yield` that iterates over a Range and you turn this into a parallel Range -- and you use immutable data structures in the `yield` -- you instantly have a parallelized loop. It's amazing, really. – Wayne May 07 '16 at 02:37