How do laziness and parallelism coexist in Haskell?

Question

People argue that Haskell has an advantage in parallelism since it has immutable data structures. But Haskell is also lazy. It means data actually can be mutated from thunk to evaluated result.

So it seems laziness can harm the advantage of immutability. Am I wrong or does Haskell have countermeasures for this problem? Or is this Haskell's own feature?

"data actually can be mutate from thunk to evaluated result" can you say more about what you mean here and why you believe this is true? — Thomas M. DuBuisson, Aug 12 '19 at 01:03
@Thomas https://wiki.haskell.org/Thunk I don't have exact information of haskell implementation, but that was the simplest solution to implement laziness and thunk as I think. — damhiya, Aug 12 '19 at 01:16
Also : https://en.m.wikibooks.org/wiki/Haskell/Laziness#Thunks_and_Weak_head_normal_form — damhiya, Aug 12 '19 at 01:20
The point is, each thread must sync the information whether the thunk is already evaluated or not. Or each thread must re evaluate the thunk. (As I think) — damhiya, Aug 12 '19 at 01:26
Laziness often only applies to **initialization** - supposing you have a global lock, then you can initialize an object on a single thread and make the other threads (or promises/tasks/futures) _wait_ and then provide an initialized, immutable, data value to the new concurrent processes. No contradiction there. — Dai, Aug 12 '19 at 01:31
@Dai There are no guarantee that the object can be fully evaluated before other threads start since haskell allows infinite data structure. — damhiya, Aug 12 '19 at 01:46
@Dai So the thread have to block other tasks every time when evaluation of shared value performed. — damhiya, Aug 12 '19 at 01:51

score 36 · Answer 1 · answered Aug 12 '19 at 05:17

Yes, GHC’s RTS uses thunks to implement non-strict evaluation, and they use mutation under the hood, so they require some synchronisation. However, this is simplified due to the fact that most heap objects are immutable and functions are referentially transparent.

In a multithreaded program, evaluation of a thunk proceeds as follows:

The thunk is atomically^† replaced with a BLACKHOLE object
If the same thread attempts to force the thunk after it’s been updated to a BLACKHOLE, this represents an infinite loop, and the RTS throws an exception (<<loop>>)
If a different thread attempts to force the thunk while it’s a BLACKHOLE, it blocks until the original thread has finished evaluating the thunk and updated it with a value
When evaluation is complete, the original thread atomically^† replaces the thunk with its result

^† e.g., using a compare-and-swap (CAS) instruction

So there is a potential race here: if two threads attempt to force the same thunk at the same time, they may both begin evaluating it. In that case, they will do some redundant work—however, one thread will succeed in overwriting the BLACKHOLE with the result, and the other thread will simply discard the result that it calculated, because its CAS will fail.

Safe code cannot detect this, because it can’t obtain the address of an object or determine the state of a thunk. And in practice, this type of collision is rare for a couple of reasons:

Concurrent code generally partitions workloads across threads in a manner suited to the particular problem, so there is low risk of overlap
Evaluation of thunks is generally fairly “shallow” before you reach weak head normal form, so the probability of a “collision” is low

So thunks ultimately provide a good performance tradeoff when implementing non-strict evaluation, even in a concurrent context.

The first step isn't quite right, because GHC uses lazy blackholing (a different sense of "lazy" from the usual one). See the section "Black holes and revelations" in http://mainisusuallyafunction.blogspot.com/2011/10/thunks-and-lazy-blackholes-introduction.html?m=1 — dfeuer, Aug 12 '19 at 06:19
Consider including a link to the paper with all the gory details, [Haskell on a Shared-Memory Multiprocessor](https://simonmar.github.io/bib/papers/multiproc.pdf). A lot of time is spent discussing very low level details that are relevant to making the result ultimately go fast in practice. — Alexis King, Aug 12 '19 at 14:18

How do laziness and parallelism coexist in Haskell?

1 Answers1