Efficiency/scalability of parallel collections in Scala (graphs)

Question

So I've been working with parallel collections in Scala for a graph project I'm working on, I've got the basics of the graph class defined, it is currently using a scala.collection.mutable.HashMap where the key is Int and the value is ListBuffer[Int] (adjacency list). (EDIT: This has since been change to ArrayBuffer[Int]

I had done a similar thing a few months ago in C++, with a std::vector<int, std::vector<int> >.

What I'm trying to do now is run a metric between all pairs of vertices in the graph, so in C++ I did something like this:

// myVec = std::vector<int> of vertices
for (std::vector<int>::iterator iter = myVec.begin(); iter != myVec.end(); ++iter) {
    for (std::vector<int>::iterator iter2 = myVec.begin(); 
        iter2 != myVec.end(); ++iter2) {
        /* Run algorithm between *iter and *iter2 */
    }
}

I did the same thing in Scala, parallelized, (or tried to) by doing this:

// vertexList is a List[Int] (NOW CHANGED TO Array[Int] - see below)
vertexList.par.foreach(u =>
  vertexList.foreach(v =>
    /* Run algorithm between u and v */
  )
)

The C++ version is clearly single-threaded, the Scala version has .par so it's using parallel collections and is multi-threaded on 8 cores (same machine). However, the C++ version processed 305,570 pairs in a span of roughly 3 days, whereas the Scala version so far has only processed 23,573 in 17 hours.

Assuming I did my math correctly, the single-threaded C++ version is roughly 3x faster than the Scala version. Is Scala really that much slower than C++, or am I completely mis-using Scala (I only recently started - I'm about 300 pages into Programming in Scala)?

Thanks! -kstruct

EDIT To use a while loop, do I do something like..

// Where vertexList is an Array[Int]
vertexList.par.foreach(u =>
  while (i <- 0 until vertexList.length) {
    /* Run algorithm between u and vertexList(i) */
  }
}

If you guys mean use a while loop for the entire thing, is there an equivalent of .par.foreach for whiles?

EDIT2 Wait a second, that code isn't even right - my bad. How would I parallelize this using while loops? If I have some var i that keeps track of the iteration, then wouldn't all threads be sharing that i?

That seems far too slow to me. However, it's hard to know without more information. How long does it take the inner-loop to complete on average? I would try profiling the scala application--single threaded--using YourKit to see if something in your algorithm is taking surprisingly long. — schmmd, Mar 16 '12 at 19:02
Try to profile it. My guess is that it's because of boxing. `ListBuffer[Int]` can only store boxed integers. Try switching to `Array[Int]`. — ziggystar, Mar 16 '12 at 19:15
It would need to be an ArrayBuffer[Int] right, since the user may decide to add/delete edges? — adelbertc, Mar 16 '12 at 19:37
See @higherkindeds preso https://docs.google.com/present/view?id=ddmmbr8g_11fp6dq96s — oluies, Mar 16 '12 at 20:14
This brings up another question, should I then just use `var i = 0; while (i != someBound) /* stuff */ i += 1;` instead of `for (i <- 0 to someBound)` for efficiency, or am I misinterpreting those slides? — adelbertc, Mar 16 '12 at 20:38
See something like http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504/ref=sr_1_1?s=books&ie=UTF8&qid=1296043652&sr=1-1 — oluies, Mar 16 '12 at 20:42
http://stackoverflow.com/questions/2443885/graph-library-for-scala — oluies, Mar 16 '12 at 20:43

leedm777 · Accepted Answer · 2012-03-17T20:14:26.737

4

From your comments, I see that your updating a shared mutable HashMap at the end of each algorithm run. And if you're randomizing your walks, a shared Random is also a contention point.

I recommend two changes:

Use .map and .flatMap to return an immutable collection instead of modifying a shared collection.
Use a ThreadLocalRandom (from either Akka or Java 7) to reduce contention on the random number generator
Check the rest of your algorithm for further possible contention points.
You may try running the inner loop in parallel, too. But without knowing your algorithm, it's hard to know if that will help or hurt. Fortunately, running all combinations of parallel and sequential collections is very simple; just switch out pVertexList and vertexList in the code below.

Something like this:

val pVertexList = vertexList.par
val allResult = for {
  u <- pVertexList
  v <- pVertexList
} yield {
  /* Run algorithm between u and v */
  ((u -> v) -> result)
}

The value allResult will be a ParVector[((Int, Int), Int)]. You may call .toMap on it to convert that into a Map.

edited Mar 17 '12 at 20:14

answered Mar 16 '12 at 22:06

leedm777

23,444
10
58
87

1

I think making it "more parallel" will be counter-productive. +1 for the contention idea. – Daniel C. Sobral Mar 17 '12 at 02:11
Hm.. how could I determine if there is contention (not familiar with this concept, just did a quick Wikipedia search)? My algorithm is essentially a random walk between pairs of vertices, not much else... Also, what is the difference between doing what you posted and doing a "plain" `vertexList.par.foreach(u => vertexList.foreach(v => ...` ? – adelbertc Mar 17 '12 at 03:46
@kstruct It must be doing something with those random walks, such as storing data somewhere or printing it out. When you have mutable state that's shared between threads, you have to have [some mechanism](http://docs.oracle.com/javase/tutorial/essential/concurrency/) to make sure that writes don't interfere with each other, and reads don't read half-written data. Multi-threaded programming + shared mutable state is _extremely_ difficult to do correctly and reason about, and would completely destroy any benefit from parallel collections. – leedm777 Mar 17 '12 at 18:16
re: the difference between my code and yours, `.par` does not modify the original collection, but returns a new parallel collection. In my version both the inner and outer loop are parallel; in yours only the outer loop is. As @DanielC.Sobral said, though, if there's contention in the algorithm, then all the parallelism in the world won't help you. – leedm777 Mar 17 '12 at 18:21
@kstruct As far as finding contention, look for the use of shared mutable state, such as I/O, shared objects, mutexes, synchronized methods, etc. – leedm777 Mar 17 '12 at 18:28
Yeah, once each random walk terminates I store it in a HashMap[Tuple2[Int, Int], Int] where the key is the two vertices and the value is the random walk. Regarding your change, would that all I would need to do to potentially make my code more efficient/parallel? – adelbertc Mar 17 '12 at 18:30
Probably not. It's the shared, mutable `HashMap` that's the likely culprit. Will update answer accordingly. – leedm777 Mar 17 '12 at 20:04
In regards to I/O as contention, how can I change that? I'm using I/O to print something like "Starting random walk between u and v" before I start each random walk, just to let me know that the code is progressing - what would be a better way to do this? Likewise, how about `ThreadLocalRandom` from Java 7? The "official" Java version that is supported on Mac OS seems to be 6, not 7.. what's the best way to get 7 installed? Or should I be using a different way to get `ThreadLocalRandom`? Thanks! – adelbertc Mar 19 '12 at 03:49
@kstruct Java 7 on OS X is [getting there](https://wikis.oracle.com/display/OpenJDK/Mac+OS+X+Port+Project+Status), but instead I'd recommend just bringing in [Akka 2.0](http://doc.akka.io/docs/akka/2.0/). It provides a [ThreadLocalRandom](http://doc.akka.io/api/akka/2.0/#akka.jsr166y.ThreadLocalRandom) you can use w/ Java 6. Re: contention on stdout, you could create an [Akka Actor](http://doc.akka.io/docs/akka/2.0/scala/actors.html) to handle your output in a way that minimizes contention. But I recommend testing to see if that helps or hurts; that's the only way to know for sure. – leedm777 Mar 19 '12 at 04:37
Thanks for that Java 7 website, really helpful. Regarding Akka 2.0, is there any advantage of just using the Typesafe stack (http://typesafe.com/stack) or just pure Akka 2.0? Does the Typesafe stack provide anything that would be helpful? – adelbertc Mar 19 '12 at 07:29
I just downloaded Typesafe.. the website seems to say that it comes with Scala, Akka, Play, sbt, etc. but the download only seems to include sbt.. are we to install Scala and Akka ourselves? – adelbertc Mar 19 '12 at 08:00
@kstruct I recommend just using [SBT](https://github.com/harrah/xsbt/wiki/Getting-Started-Welcome). It will automagically download Scala, Akka and any other dependencies you declare in your project. Akka has its own [Getting Started guide](http://doc.akka.io/docs/akka/2.0/intro/getting-started.html). – leedm777 Mar 19 '12 at 16:23
So I got SBT through Typesafe.. 1. To delete an SBT project, do I just need to do `rm -r ` for that project, or are there other SBT generated hidden files I can delete? 2. I deleted my original scala folder in my attempt to test out SBT, and it seems `sbt run` still compiles the scala files just fine, but if I do a straight `scala` on the command line the REPL does not come up.. does SBT hide `scala` from the user? – adelbertc Mar 19 '12 at 17:02
Alright I got ThreadLocalRandom in, soooo much faster. Thanks a lot! – adelbertc Mar 19 '12 at 18:06

score 2 · Answer 2 · edited May 23 '17 at 12:34

2

Why mutable? I don't think there's a good parallel mutable map on Scala 2.9.x -- particularly because just such a data structure was added to the upcoming Scala 2.10.

On the other hand... you have a List[Int]? Don't use that, use a Vector[Int]. Also, are you sure you aren't wasting time elsewhere, doing the conversions from your mutable maps and buffers into immutable lists? Scala data structures are different than C++'s so you might well be incurring in complexity problems elsewhere in the code.

Finally, I think dave might be onto something when he asks about contention. If you have contention, parallelism might well make things slower. How faster/slower does it run if you do not make it parallel? If making it not parallel makes it faster, then you most likely do have contention issues.

edited May 23 '17 at 12:34

Community

1
1

answered Mar 17 '12 at 02:18

Daniel C. Sobral

295,120
86
501
681

Why do you recommend `Vector` in this example? `List` seems fine if he's just `foreach`-ing. – schmmd Mar 17 '12 at 02:56
I have it mutable since the user may want to add vertices at any time - I'm new to Scala of course so perhaps there is a better way of handling this. I should note though that I tend to deal with large graphs (at least a few thousand vertices) so efficiency/scalability is key. As for wasting time somewhere else, I'm fairly sure - the most computationally expensive part is essentially doing a random walk between pairs of vertices, which doesn't mutate any data. – adelbertc Mar 17 '12 at 03:43
2

@schmmd The `.par` call will create a new, parallel collection. Both for `List` and `Vector`, at the moment, the collection created will be `ParVector` (the static type differs, but the implementing class is the same). A `ParVector` just uses an underlying `Vector` and override methods, so converting a `Vector` into it is O(1), while converting a `List` into it is O(n). In fact, any parallel collection you convert a `List` into will be O(n), because `List` is not amenable to parallelism. – Daniel C. Sobral Mar 17 '12 at 14:57
Currently I'm using an `Array[Int]`, not `Vector[Int]`.. what are the differences between the two? I take it `Vector`s are similar to `List`s (i.e. immutable) but allow O(1) random access? – adelbertc Mar 17 '12 at 18:44
1

@kstruct In your question you said `vertexList is a List[Int]`. It's difficult to advise about some code when we are not getting correct information. `Vector` is immutable, and it's close to O(1) random access, but `Array` is has peculiarities of its own because of how the JVM treats it. But once you call `.par`, you get a NEW collection, and how that collection is being generated is important. – Daniel C. Sobral Mar 18 '12 at 05:04
Ah my bad - it was initially a `List[Int]` but I later changed it to `Array[Int]` once I found out it didn't support efficient random access. I will change it to `Vector[Int]` and do the separate collection generation - thanks! How can I convert an `ArrayBuffer[Int]` to a `Vector[Int]`? I can't find something like a `toVector` method in the Scala docs... I have an `ArrayBuffer[Int]` initially since the graph can be mutated (edges created), and the graph potentially is large so the creation of a whole new `Vector` is not ideal... – adelbertc Mar 18 '12 at 05:18
Found it - currently I'm doing `Vector(myArray: _*)` hopefully that's a good way to do it - please let me know if it's not. Thanks for all your help, I've learned a lot in this thread! Gotta love SO. – adelbertc Mar 18 '12 at 07:11
@kstruct That's good enough, but many prefer something like `Vector() ++ myArray`. It can be more efficient if you don't have a `Seq`. – Daniel C. Sobral Mar 19 '12 at 02:47

score 0 · Answer 3 · answered Mar 16 '12 at 19:22

0

I'm not completely sure about it, but I think foreach loops in foreach loops are rather slow, because lots of objects get created. See: http://scala-programming-language.1934581.n4.nabble.com/for-loop-vs-while-loop-performance-td1935856.html

Try rewriting it using a while loop.

Also Lists are only efficient for head access, Arrays are probably faster.

answered Mar 16 '12 at 19:22

Jens Schauder

77,657
34
181
348

Should I changed the List[Int] that holds the vertices of the graph into an Array[Int], or should I change the ListBuffer[Int] of the adjacency list to ArrayBuffer[Int], or both? – adelbertc Mar 16 '12 at 19:37
Yes, try to use an Array and use a while loop on the iterator instead of the foreach loop. But it would also be interesting to see, what you actually do inside the loop. Maybe there is more room for optimizations. Edit: Answered the same minute ;-)... Use it for both, or play around a bit – drexin Mar 16 '12 at 19:37
To use a while loop, do you mean something like (see above). – adelbertc Mar 16 '12 at 19:51
1

The linked thread is quite old. Lists are efficient for iteration (all you are doing is head access over and over again on the tail list), but Arrays may have benefits with memory locality. – schmmd Mar 16 '12 at 19:58
Non-sense. You'll lose parallelism. – Daniel C. Sobral Mar 17 '12 at 02:10

Efficiency/scalability of parallel collections in Scala (graphs)

3 Answers3

Linked