How Haskell handles parallel computing on a multicore machine/cluster

Question

I'm considering a new language to learn those days to be used in high performance computing on a cluster of computers we have, among those languages, I'm considering Haskell.

I have read some about Haskell, but still have questions about using Haskell in high performance and distributed computing, which the language is known for, but I read some debates about Haskell is not good for those type of systems due to laziness, I can summarize my questions in the following lines:

Haskell uses green threads, which is great for handling big number of concurrent connections, but what happens when one of tasks takes longer than average and blocks the rest, does the whole thread block (Node.js style), forward the next task to another processor/thread (Golang), use reductions technique (Erlang), which kicks the task out of processing context after a pre-determined number of ticks, or else?
In a distributed computing environment, what happens to lazily-evaluated functions, do they have to be forced strict?
If one function/module requires strict evaluation, but it depends on other lazy functions/modules, shall I modify the code of other functions/modules to make them strict as well, or the compiler will handle this to me and force everything in that chain to strict or lazy.
When processing a very large sequence of data, how does Haskell handle parallel processing, is it by following some kind of implicit map-reduce technique, or I have do it by myself.
Is there a clustering abstract in the language, that handles the computing power for me, that automatically forwards the next task to the free processor wherever it is, be it on the same computer or another computer in the same cluster.
How does Haskell ensure fair-share of work is evenly distributed to all the available cores on the same computer or on the available cluster.

Rein Henrichs · Answer 1 · 2017-06-06T17:06:15.723

10

GHC uses a pool of available work (called sparks) and a work-stealing system: when a thread runs out of work, it will look for work in the pool or on the work queues of other threads that it can steal.
There is no built-in support for distributed computing as there is in (say) Erlang. The semantics are whatever your implementation defines. There are existing implementations like Cloud Haskell that you can look at for examples.
Neither. Haskell will automatically do whatever work is necessary to provide a value that is demanded and no more.
Haskell (and GHC in particular) does not do anything to automatically parallelize evaluation because there is no known universal strategy for parallelizing that is strictly better than not parallelizing. See Why is there no implicit parallelism in Haskell? for more info.
No. See (2).
For the same machine, it uses the pool of sparks and the work-stealing system described above. There is no notion of "clustering".

For an overview of parallel and concurrent programming in Haskell, see the free book of the same name by Simon Marlow, a primary author of GHC's runtime system.

edited Jun 06 '17 at 17:06

answered Jun 06 '17 at 16:44

Rein Henrichs

15,437
1
45
55

While reading your answer, a question popped up to my mind. Is there a way in Haskell to automatically create as many OS threads equal to the number of cores available on one machine? – securecurve Jun 07 '17 at 11:49
1

@securecurve By setting the RTS option `-N`, either as a command line argument to the executable or as a compiler flag. See https://stackoverflow.com/questions/3704430/why-do-you-have-to-use-both-a-compiler-flag-and-a-run-time-flag-to-get-multicore – Rein Henrichs Jun 07 '17 at 19:21
1

I should also mention that 'threads' in (1) is really [*capabilities*](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler#Capabilities), not green threads. Green threads are scheduled onto capabilities, which are in turn mapped onto cores. – Rein Henrichs Jun 07 '17 at 19:22

leftaroundabout · Accepted Answer · 2017-06-06T20:40:59.403

Multithreading

As far as SMP _^parallelism^† is concerned, Haskell is very effective. It's not quite automatic, but the parallel library makes it really easy to parallelise just about anything. Because the sparks are so cheap, you can be pretty careless and just ask for lots of parallelism; the runtime will then know what to do.
Unlike in most other languages, it is not a big problem if you have highly branched data structures, tricky dynamic algorithms etc. – thanks to the purely functional paradigm, parallel Haskell never needs to worry about locks when accessed data is shared.

I think the biggest caveat is memory: GHC's garbage collector is not concurrent, and the functional style is rather allocation-happy.
Apart from that, it's possible to write programs that look like they're parallel, but really don't do any work at all but just start and immediately return because of laziness. Some testing and experience is still necessary. But laziness and parallelism are not incompatible; at least not if you make sure you have big enough “chunks” of strictness in it. And forcing something strict is largely trivial.

Simpler, common parallelism tasks (which could be expressed in a map-reduce manner, or the classic array-vector stuff – the ones which are also easy in many languages) can generally be handled even easier in Haskell with libraries that parallelise the data structures; the best-known of these is repa.

Distributed computing

There has been quite some work on Cloud Haskell, which is basically Erlang in library form. This kind of task is less straightforward: the idea of any explicit message sending is a bit against Haskell's grain, and many aspects of the workflow become more cumbersome if the language is so heavily focused on its strong static typing (which is in Haskell otherwise often a huge bonus that doesn't just improve safety and performance but also makes it easier to write).

I think it's not far off to use Haskell in a distributed concurrent manner, but we can't say it's mature in that role yet. For distributed concurrent tasks, Erlang itself is certainly the way to go.

Clusters

Honestly, Haskell won't help you much at all here. A cluster is of course in principle a special case of a distributed setup, so you could employ Cloud Haskell; but in practice the needs are very different. The HPC world today (and probably quite some time into the future) hinges on MPI, and though there is a bit of existing work on MPI bindings, I haven't found them usable, at least not just like that.

MPI is definitely also quite against Haskell's grain, what with it's FORTRAN-oriented array centrism, weird ways of handling types and so on. But unless you go nuts with Haskell's cool features (though often it is so tempting!) there is no reason you couldn't write typical number-crunching code also in Haskell. The only problem is again support/maturity, but it's a considerable problem; so for cluster computing I'd recommend C++, Python or Julia instead.

An interesting alternative is to generate MPI-parallelised C or C++ code from Haskell. Paraiso is one nice project that does this.

Pipe dreams

I have often though about what could be done to make the distributed computing feasible in idiomatic Haskell. In principle I believe laziness could be a big help there. The model I'd envision is to let all machines compute independently the same program, but make use of the fact that Haskell evaluation has generally no predetermined order. The order would be randomised on each machine. Also the runtime would track how long some computation branch took to complete, and how big the result is. If a result is deemed both expensive and compact enough to warrant it, it would then be broadcast to the other nodes, together with some suitable hash that would allow them to shortcut that computation.

Such a system would never be quite as efficient as a hand-optimised MPI application, but it could at least offer the same asymptotics in many cases. And it could handle vastly more complex algorithms with ease.

But again, that's totally just my vague hopes for the not-so-near future.

^†_{You said concurrency (which isn't so much about computation as about interaction), but it seems your question is in essence about pure computations?}

Thanks for the detailed answer :) ... regarding your final note, yes, you are true, concurrency is not parallelism, but I used the word concurrent here because I had in mind Golang technique when handling things concurrently on the same thread, while forwarding the next work to the free processor/thread when the first blocks. — securecurve, Jun 07 '17 at 11:31
You said: `But laziness and parallelism are not incompatible; at least not if you make sure you have big enough “chunks” of strictness in it`, can elaborate more on this please? Who controls the `big enough chunk`? And, what makes it strict, you mean I should program the functions first hand as strict? — securecurve, Jun 07 '17 at 11:38
@securecurve no, you normally don't need to change the functions themselves. Just when you want two _results_ computed in parallel, it may be necessary to _force_ them to normal form before the spark thinks it's done with its work. — leftaroundabout, Jun 07 '17 at 11:54
How to force them in normal form (you mean strict?) ? Programmatically, using compiler directive or else? — securecurve, Jun 07 '17 at 12:01
Classically, with the [`pseq` operator](http://hackage.haskell.org/package/parallel-3.2.1.1/docs/Control-Parallel.html#v:pseq), or its “deep” pendants from the [deepseq library](http://hackage.haskell.org/package/deepseq-1.4.2.0/docs/Control-DeepSeq.html#v:rnf). — leftaroundabout, Jun 07 '17 at 12:20
For example, if you have some expensive computation that yields a list `exl` as its result, you would in principle just write ``exl `par` otherStuff`` to compute `exl` in parallel with some `otherStuff`. But now that computation might be lazy and only generate list elements _as needed_. In this case, the spark would fizzle before actually finishing the list, but you could force it to actually compute the whole list by writing ``(rnf exl `pseq` ()) `par` otherStuff``. — leftaroundabout, Jun 07 '17 at 12:24
...it's not difficult, but definitely can be a bit confusing to think about, for instance I right now try to recall for what we need the distinction between `seq` and `pseq` again. — leftaroundabout, Jun 07 '17 at 12:27
The idea that you propose in `pipe dreams` section is very much like `work stealing notion` of Haskell on one computer (without hash broadcast), but we need to think of it in terms of computers-in-a-cluster rather than threads in one computer. Can Cloud Haskell do that job (or any other alternative)? — securecurve, Jun 08 '17 at 09:36
Well, the big difference is memory. The sparks in the work-stealing system are all ploughing different, possibly overlapping regions of a single field, as it were. In fact, the threads generally _rely_ on work already having been done by others: there's no explicit passing of information between threads, ever. In the end you just reap the harvest with a single IO action. – None of this works as such in a distributed setting. On a cluster, you could fake it with a parallel file system, but I doubt it would be a good idea. — leftaroundabout, Jun 08 '17 at 10:45

How Haskell handles parallel computing on a multicore machine/cluster

2 Answers2

Multithreading

Distributed computing

Clusters

Pipe dreams