So I've been working with parallel collections in Scala for a graph project I'm working on, I've got the basics of the graph class defined, it is currently using a scala.collection.mutable.HashMap
where the key is Int
and the value is ListBuffer[Int]
(adjacency list). (EDIT: This has since been change to ArrayBuffer[Int]
I had done a similar thing a few months ago in C++, with a std::vector<int, std::vector<int> >
.
What I'm trying to do now is run a metric between all pairs of vertices in the graph, so in C++ I did something like this:
// myVec = std::vector<int> of vertices
for (std::vector<int>::iterator iter = myVec.begin(); iter != myVec.end(); ++iter) {
for (std::vector<int>::iterator iter2 = myVec.begin();
iter2 != myVec.end(); ++iter2) {
/* Run algorithm between *iter and *iter2 */
}
}
I did the same thing in Scala, parallelized, (or tried to) by doing this:
// vertexList is a List[Int] (NOW CHANGED TO Array[Int] - see below)
vertexList.par.foreach(u =>
vertexList.foreach(v =>
/* Run algorithm between u and v */
)
)
The C++ version is clearly single-threaded, the Scala version has .par
so it's using parallel collections and is multi-threaded on 8 cores (same machine). However, the C++ version processed 305,570 pairs in a span of roughly 3 days, whereas the Scala version so far has only processed 23,573 in 17 hours.
Assuming I did my math correctly, the single-threaded C++ version is roughly 3x faster than the Scala version. Is Scala really that much slower than C++, or am I completely mis-using Scala (I only recently started - I'm about 300 pages into Programming in Scala)?
Thanks! -kstruct
EDIT To use a while loop, do I do something like..
// Where vertexList is an Array[Int]
vertexList.par.foreach(u =>
while (i <- 0 until vertexList.length) {
/* Run algorithm between u and vertexList(i) */
}
}
If you guys mean use a while loop for the entire thing, is there an equivalent of .par.foreach
for whiles?
EDIT2 Wait a second, that code isn't even right - my bad. How would I parallelize this using while loops? If I have some var i
that keeps track of the iteration, then wouldn't all threads be sharing that i
?