A pipeline of maps/folds where each "transformation layer" is run in parallel in Haskell? ("Vertical" parallelism as opposed to "horizontal" parMap.)

Question

Most questions I've seen regarding parallel list processing are concerned with the kind of parallelism achieved by chunking the list and processing each chunk in parallel to each other.

My question is different.

I have something simpler/more stupid in mind concerning a sequence of maps and folds: what if we want to simply set up a job for the first map which should be done in parallel to the second map?

The structure of the computation I'm thinking of:

xs -- initial data

ys = y1 : y2 : ... : yn -- y1 = f x1, ... and so on.
-- computed in one parallel job.

zs = z1 : z2 : ... : zn -- z1 = g y1, ... and so on.
-- computed in another job (the applications of `g`), i.e., the "main" job.

Will something in the spirit of the following code work?

ys = map f xs
zs = ys `par` map g' ys
    where g' y = y `pseq` g y

I'd only need to say that ys should be evaluated with a kind of deepSeq instead of simply writing:

ys `par` ...

So while the main job would be busy with computing a g, we are also forcing the premature computation of ys in parallel.

Is there anything wrong with this approach?

The documentation and examples on par and pseq are a bit scarce for me to understand how this will work out. The difference of my code from what I've seen in some examples is that the values on the left side of par and pseq are different in my code.

Discussion

I can think of similar parallelization for other kinds of transformations (fold, scan, and more complex compositions).

For one thing, I'm afraid that the elements of ys could be evaluated twice if g is too quick...

This should give a fixed two-times speedup with 2 cores.

And if there are more such costly transformation nodes (say, N) in my pipeline, I'd get a fixed N-times speedup.

As for my vertical parallelization vs theirs(1,2,etc.) horizontal (achieved with parMap): I want to get faster streaming. In other words: I want to see the intermediate results (incremental inits zs) faster in the first place.

CORRECTION

It seems that I didn't understand pseq. Consider my old code from above:

zs = ys `par` map g' ys
    where g' y = y `pseq` g y

and re-read the documentation for pseq:

seq is strict in both its arguments, so the compiler may, for example, rearrange
a `seq` b
into ... . ... it can be a problem when annotating code for parallelism, because we need more control over the order of evaluation; we may want to evaluate a before b, because we know that b has already been sparked in parallel with par.

So, in my case, y is a part of value which I want to have sparked and forced with par So it's like b, and there is no need/sense to put it under a pseq, right?

But I'm a bit afraid still whether its computation can accidentally be duplicated if we are too fast in the map....

I've also had a look at Ch.2 of Parallel and Concurrent Programming in Haskell, but they talk about rpar and rseq... And they seem to imply that it is OK to do rpar/rpar or rpar/rseq/rseq without extra worrying for waiting for the value in the case of rpar/rpar. Have I got something wrong?

why not just use one of the functor-laws and do `zs = map (g . f)` (in parallel if you wish) instead? - I know it's no answer, just saying — Random Dev, May 05 '15 at 04:52
@CarstenKönig Well, I thought of this as a way to parallelize things and get 2-times speedup. With `map (g . f)` I'm doing everything in one thread. If `g` and `f` are costly and/or the lists are huge, the 2-times speedup must be noticeable compared to doing the direct composition, and the cost of a list construction neglectable. — imz -- Ivan Zakharyaschev, May 05 '15 at 04:56
@CarstenKönig The answers there are the other kind of parallelization: they parallelize the computations of elements of a list. I'd call one kind of parallelization *"horizontal"* and the other one *"vertical"* -- looking at my pseudo-code for the structure of computation in my question. (Actually, if I need to, I could do both. But my approach would result in something like faster *"streaming"*, where as theirs -- in faster approach to the end result.) I want to see the intermediate results (incremental `inits zs`) faster in the first place. — imz -- Ivan Zakharyaschev, May 05 '15 at 05:02
ok - so you want to do some kind of streaming/pipes/conduits kind of stuff - sorry I missunderstood - maybe you'll finde something in the pipes ecosystem (https://hackage.haskell.org/package/pipes-concurrency-2.0.1/docs/Pipes-Concurrent-Tutorial.html comes to mind) — Random Dev, May 05 '15 at 05:06
@CarstenKönig Ok, there may be something in that ecosystem. But I want also to learn the basic primitive stuff. The simple and stupid code I came up with seems to reasonably express my wishes. So, without turning to a higher-level library, how this parallelization can be done? I find the available documentation and examples for `par` and `pseq` too scarce for me to be able to understand whether this primitive approach is alright (in simple cases). — imz -- Ivan Zakharyaschev, May 05 '15 at 05:11
Did you take a [look](http://chimera.labs.oreilly.com/books/1230000000929/ch04.html#sec_par-monad-streams) in Simon Marlow's excellent book on parallel and concurrent Haskell? — j.p., May 05 '15 at 11:01
@j.p. Not a very deep look. Only Chapter 2, as I mentioned in a reference. Thanks for the reference to chapter 4! I believe it would give right ideas about impleementing the stuff I care about here. The thing that I would like to know, but am missing though: the usage of the basic `par` (and `pseq`) in this manner. The book uses higher-level stuff (`Par` monad, `rseq` and `rpar`); I should probably look into their siurce code and see how they are implemented through `par` and `pseq` (if this is the case), and fill my understanding gap... — imz -- Ivan Zakharyaschev, May 05 '15 at 11:30
I suspect this is less popular because many/most practical applications will either be handled by superscalar CPUs with little help from you or will require very specialized hardware to be effective. The trouble is that this sort of parallelism on general-purpose hardware seems likely to involve a lot of communication among threads, and also looks rather tough to balance. I'm not an expert, though. — dfeuer, May 05 '15 at 23:54
@dfeuer aha the idea is that in `map (g .f)` the composition would be optimized by hardware pipelining, so no need to worry. One detail I haven't mentioned: doing the two layers (`map f` and `map g`) seems natural if the `ys` will be re-used in different branches of computation afterwards. As for the communication overhead: my thought was that (at least, from the Haskell programmer perspective) having just two parallel layers would cause less overhead than, say, `parMap` or `par` inside a `map` because I'm creating 2 constant long-lived sparks/threads instead of many. The internals may differ. — imz -- Ivan Zakharyaschev, May 06 '15 at 08:15
@dfeuer As for balancing: I agree. It's not obvious and ideal balancing would hardly be possible. Only relying on the scheduler making use of idle CPUs is what is left to hope on. — imz -- Ivan Zakharyaschev, May 06 '15 at 08:18
An interesting answer from Michael worth to study can be found [there](http://meta.stackoverflow.com/a/293639/94687). — imz -- Ivan Zakharyaschev, May 07 '15 at 14:07

A pipeline of maps/folds where each "transformation layer" is run in parallel in Haskell? ("Vertical" parallelism as opposed to "horizontal" parMap.)

Discussion

CORRECTION

0 Answers0