11

I want to write a parallel map function in Haskell that's as efficient as possible. My initial attempt, which seems to be currently best, is to simply write,

pmap :: (a -> b) -> [a] -> [b]
pmap f = runEval . parList rseq . map f

I'm not seeing perfect CPU division, however. If this is possibly related to the number of sparks, could I write a pmap that divides the list into # of cpus segments, so there are minimal sparks created? I tried the following, but the peformance (and number of sparks) is much worse,

pmap :: (a -> b) -> [a] -> [b]
pmap f xs = concat $ runEval $ parList rseq $ map (map f) (chunk xs) where
    -- the (len / 4) argument represents the size of the sublists
    chunk xs = chunk' ((length xs) `div` 4) xs
    chunk' n xs | length xs <= n = [xs]
                | otherwise = take n xs : chunk (drop n xs)

The worse performance may be correlated with the higher memory use. The original pmap does scale somewhat on 24-core systems, so it's not that I don't have enough data. (The number of CPU's on my desktop is 4, so I just hardcoded that).

Edit 1

Some performance data using +RTS -H512m -N -sstderr -RTS is here:

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
gatoatigrado
  • 16,580
  • 18
  • 81
  • 143
  • 3
    Tuning `parMap` to spark once for each core isn't a sure way to go - each element might take a different amount of work to compute. For example, in the trivial `fib` implementation, the work increases significantly for each successive element, so placing the last `n` elements in the same spark will result in very little parallelism. – Thomas M. DuBuisson May 11 '11 at 20:24

1 Answers1

10

The parallel package defines a number of parallel map strategies for you:

parMap :: Strategy b -> (a -> b) -> [a] -> [b]

A combination of parList and map, and specific support for chunking the list:

parListChunk :: Int -> Strategy a -> Strategy [a]

Divides a list into chunks, and applies the strategy evalList strat to each chunk in parallel.

You should be able to use a combination of these to get any sparking behavior you desire. Or, for even more control, the Par monad package, for controlling the amount of threads created (purely).


References: The haddock docs for the parallel package

nh2
  • 24,526
  • 11
  • 79
  • 128
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • Great, that gives some control over the number of sparks. Sorry I missed it on hackage ... at least it's on stackoverflow now. Unfortunately, performance isn't much better, but likely my fault.Oddly, `-g1` for parallel garbage collection brings the garbage collection stat way down, but runtime doesn't change... – gatoatigrado May 11 '11 at 20:51
  • 1
    @gatoatigrado: try using with `-qa` and `-qg`. These two options sometimes help gc performance of parallel programs. Sometimes they're worse though, so be sure to test them. – John L May 11 '11 at 21:16
  • 2
    in case anyone visits this question, the answer at this sister question might be useful (in particular, the rdeepseq), http://stackoverflow.com/questions/5606165/parallel-map-in-haskell – gatoatigrado Jun 10 '11 at 06:21