Haskell -- parallel map that makes less sparks

Question

I want to write a parallel map function in Haskell that's as efficient as possible. My initial attempt, which seems to be currently best, is to simply write,

pmap :: (a -> b) -> [a] -> [b]
pmap f = runEval . parList rseq . map f

I'm not seeing perfect CPU division, however. If this is possibly related to the number of sparks, could I write a pmap that divides the list into # of cpus segments, so there are minimal sparks created? I tried the following, but the peformance (and number of sparks) is much worse,

pmap :: (a -> b) -> [a] -> [b]
pmap f xs = concat $ runEval $ parList rseq $ map (map f) (chunk xs) where
    -- the (len / 4) argument represents the size of the sublists
    chunk xs = chunk' ((length xs) `div` 4) xs
    chunk' n xs | length xs <= n = [xs]
                | otherwise = take n xs : chunk (drop n xs)

The worse performance may be correlated with the higher memory use. The original pmap does scale somewhat on 24-core systems, so it's not that I don't have enough data. (The number of CPU's on my desktop is 4, so I just hardcoded that).

Edit 1

Some performance data using +RTS -H512m -N -sstderr -RTS is here:

Tuning `parMap` to spark once for each core isn't a sure way to go - each element might take a different amount of work to compute. For example, in the trivial `fib` implementation, the work increases significantly for each successive element, so placing the last `n` elements in the same spark will result in very little parallelism. — Thomas M. DuBuisson, May 11 '11 at 20:24

score 10 · Accepted Answer · edited Jan 04 '13 at 22:56

10

The parallel package defines a number of parallel map strategies for you:

parMap :: Strategy b -> (a -> b) -> [a] -> [b]

A combination of parList and map, and specific support for chunking the list:

parListChunk :: Int -> Strategy a -> Strategy [a]

Divides a list into chunks, and applies the strategy evalList strat to each chunk in parallel.

You should be able to use a combination of these to get any sparking behavior you desire. Or, for even more control, the Par monad package, for controlling the amount of threads created (purely).

References: The haddock docs for the parallel package

edited Jan 04 '13 at 22:56

nh2

24,526
11
79
128

answered May 11 '11 at 20:19

Don Stewart

137,316
36
365
468

Great, that gives some control over the number of sparks. Sorry I missed it on hackage ... at least it's on stackoverflow now. Unfortunately, performance isn't much better, but likely my fault.Oddly, `-g1` for parallel garbage collection brings the garbage collection stat way down, but runtime doesn't change... – gatoatigrado May 11 '11 at 20:51
1

@gatoatigrado: try using with `-qa` and `-qg`. These two options sometimes help gc performance of parallel programs. Sometimes they're worse though, so be sure to test them. – John L May 11 '11 at 21:16
2

in case anyone visits this question, the answer at this sister question might be useful (in particular, the rdeepseq), http://stackoverflow.com/questions/5606165/parallel-map-in-haskell – gatoatigrado Jun 10 '11 at 06:21

Haskell -- parallel map that makes less sparks

Edit 1

1 Answers1