I'm trying to use Scala's parallel collections to dispatch some computations in parallel. Because there's a lot of input data, I'm using mutable arrays to store data to avoid GC issues. This is the initial approach I took:
// initialize the reusable input data structure
val inputData = new Array[Array[Int]](Runtime.getRuntime.availableProcessors*ChunkSize)
for (i <- 0 until inputData.length) {
inputData(i) = new Array[Int](arraySize)
}
// process the input
while (haveMoreInput()) {
// read the input--must be sequential!
for (array <- 0 until inputData.length) {
for (index <- 0 until arraySize) {
array(index) = deserializeFromExternalSource()
}
}
// map the data in parallel
// note that the input data is NOT modified by longRuningProcess
val results = for (array <- inputData.par) yield {
longRunningProcess(array)
}
// use the results--must be sequential and ordered as input
for (result <- results.toArray) {
useResult(result)
}
}
Given that a ParallelArray
's underlying array can be safely reused (viz., modified and used as the underlying structure of another ParallelArray
), the above snipped should work as expected. However, when run it crashes with a memory error:
*** Error in `*** Error in `java': double free or corruption (fasttop): <memory address> ***
This is ostensibly related to the fact that the parallel collection directly uses the array it was created from; perhaps it's attempting to free this array when it goes out of scope. In any case, creating a new array with each loop isn't an option, again, due to memory constraints. Explicitly creating a var parInputData = inputData.par
both inside and outside of the while
loop leads to the same double-free error.
I can't simply make inputData
itself a parallel collection because it needs to be populated sequentially (having tried to make assignments to a parallel version, I realized that assignments were not performed in order). Using a Vector
as the outer data structure seems to work for relatively small input sizes (< 1000000 input arrays) but leads to GC overhead exceptions on large inputs.
The approach I ended up taking involved making a Vector[Vector[Array[Int]]]
, with the outer vector having a length equal to the number of parallel threads being used. I then manually populated each sub-Vector
with a chunk of input data arrays and then did a parallel map over the outer vector.
This final approach works, but it is tedious to manually separate the input into chunks and add those chunks to a parallel collection another level deep. Is there a way to allow Scala to reuse a mutable array for parallel operations?
EDIT: Benchmarking the parallel vector solution above against a manually-parallelized solution using synchronous queues showed the parallel vector to be about 50% slower. I'm wondering if this is simply the overhead of a better abstraction or if this gap can be reduced by use of parallel arrays rather than Vector
s; this would lead to yet another benefit of using arrays versus Vector
s.