Understanding order of evaluation in Haskell

Question

I am trying to understand how where clauses evaluate in Haskell. Say we got this toy example where bar, baz and bat are defined functions somewhere:

func x = foo i j k
  where
    foo i j k = i + j + k
    k = bat x
    j = baz k
    i = bar j

How does the line func x = foo i j k expand? Does it evaluate to something like func x = foo(i(j(k)), j(k), k) or func x = foo(i, j, k)?

And nothing is evaluated. Haskell will only build an expression tree that *can* later be evaluated in case we need it. — Willem Van Onsem, Jan 22 '18 at 14:43
@WillemVanOnsem that is good to know. However, if we do need this code somewhere and it will be evaluated, how does one explain the values i, j and k? What I am unsure about are the implications of i depending on j, j depending on k, and k depending on x. Will j somehow be concatenated to i, or does something else happen? — Simon Carlson, Jan 22 '18 at 14:45
well you should see variables are references to *expressions*. So if `func x` is somehow evaluated, it will evaluate `foo i j k` and at that point `i = bar j`, `j = baz k`, `k = bat x`. Now that thus means that we will call `bar j`, in case `bar` needs `j` (that is not certain), it will also evaluate `baz k`, and if `baz k` needs `k`, it will evaluate `bat x`. Then we need to evaluate `j`, but since it is a reference to the same expression, we do not need to do that a second time, since it was already evaluated before, the same for `k`. — Willem Van Onsem, Jan 22 '18 at 14:51
@WillemVanOnsem so if `bar` needs `j` and `baz` needs `k`, the variables will be unraveled to `i = bar baz bat x`, `j = baz bat x`, `k = bat x`? — Simon Carlson, Jan 22 '18 at 14:56
no, first of all the synax is wrong (you need parenthesis). But furthermore the calculations are "shared". So if the first thread will calculate `k`, we do no longer need to calculate `bat x` a second time. — Willem Van Onsem, Jan 22 '18 at 14:59
`bat x` is calculated by the first thread as `k` and the other threads will not have to calculate it again. But does the function `foo` apply the value `k` a total of three times in its additions, two of them indirectly through the values `j` and `i`? — Simon Carlson, Jan 22 '18 at 15:07
Could you fix your code to not have a serious type error? As it stands, it's really hard to tell exactly what you mean, as you could mean several different things, depending on how you fix it. Is `foo` a function or not? — Carl, Jan 22 '18 at 15:08
@SimonCarlson: the value of `k` is calculated *at most* one time. It also depends on the functions `bar`, `baz`, and `bat`. — Willem Van Onsem, Jan 22 '18 at 15:08
@Carl `foo` is supposed to be a function. I just got started with Haskell and am unsure of the syntax, if you could edit the code so that `foo` is a function that would be swell. — Simon Carlson, Jan 22 '18 at 15:09
Where clauses are not evaluated. They introduce variable bindings (definitions). Definitions like `foo x = func i j k` are not evaluated either. Expressions are evaluated. — n. m. could be an AI, Jan 22 '18 at 15:18
You might like to play with this to get some intuition about evaluation order (or the lack thereof): http://chrisuehlinger.com/LambdaBubblePop/ — Li-yao Xia, Jan 22 '18 at 15:19

Ignat Insarov · Accepted Answer · 2018-01-22T19:16:44.117

Intro

I will assume you meant to write this code:

func :: Int -> Int
func x = foo
  where
    foo = i + j + k
    k = bat x
    j = baz k
    i = bar j

This way it will type check and all three functions you defined in the where clause will eventually get called. If this is not what you meant, still do read on, as I will not only give you a depiction of the way your code evaluates, but also a method of determining the answer yourself. It may be a bit of a long story, but I hope it will worth your time.

Core

Evaluation of code absolutely depends on your choice of the compiler, but I suppose you will be using GHC, and if so, it will transform your code several times before reducing it to machine code.

First, "where clauses" will be replaced with "let clauses". This is done so as to reduce Haskell syntax to a simpler Core syntax. Core is similar enough to a math theory called lambda calculus for its eventual evaluation to proceed according to this solid foundation. At this point your code will look somewhat like this:

func = λx ->
      let { k = bat x } in
      let { j = baz k } in
      +
        (+ (bar j) j)
        k

As you see, one of the function definitions from the where clause of your Haskell code disappeard altogether by the time of its arrival to Core stage (actually, it was inlined), and the others were rewritten to let notation. The binary operation (+) got rewritten to polish notation to make it unambiguous (it is now clear that i + j should be computed first). All these conversions were performed without altering the meaning of the code.

Graph machine

Then, the resulting lambda expression will be reduced to a directed graph and executed by a Spineless Tagless Graph machine. In a sense, Core to STG machine is what assembler is to Turing machine, though the former is a lambda expression, while the latter a sequence of imperative instructions. (As you may see by now, the distinction between functional and imperative languages runs rather deep.) An STG machine will translate the expressions you give it to imperative instructions that are executable on a conventional computer, through a rigorously defined operational semantics -- that is, to every syntactic feature of Core (of which it only has about 4) there is a piece of imperative assembler instructions that performs the same thing, and a Core program will be translated to a combination of these pieces.

The key feature of the operational semantics of Core is its laziness. As you know, Haskell is a lazy language. What that means is that a function to be computed and the value of this function look the same: as a sequence of bytes in RAM. As the program starts, everything is laid out as functions (closures, to be precise), but once a function's return value is computed, it will be put in the place of the closure so that all further accesses to this location in memory would immediately receive the value. In other words, a value is computed only when it is needed, and only once.

As I said, an expression in Core will be turned to a directed graph of computations that depend on each other. For example:

If you look closely, I hope this graph will remind you of the program we started with. Please note two particulars about it:

All the arrows eventually lead to x, which is consistent with our idea that supplying x is enough to evaluate func.
Sometimes two arrows lead to the same box. That means the value of this box will be evaluated once, and the second time we shall have the value for free.

So, the STG machine will take some Core code and create an executable that computes the value of a graph more or less similar to the one in the picture.

Execution

Now, as we made it to the graph, it's easy to see that the computation will proceed thus:

As func is called, the value of x is received and put in the corresponding box.
bat x is computed and put in a box.
k is set to be the same as bat x. (This step will probably be dropped by some of the optimizations GHC runs on the code, but literally a let clause requests that its value be stored separately.)
baz k is computed and put in a box.
j is set to be the same as baz k, the same as with k in step 6. bar j is computed and put in a box.
Contrary to what one would expect in the light of steps 3 and 5, i is not set to anything. As we saw in the listing of Core for our program, it was optimized away.
+ (bar j) j is computed. j is already computed, so baz k will not be called this time, thanks to laziness.
The topmost value is computed. Again, there is no need to compute bat x this time, as it was computed previously and stored in the right box.
Now, the value of func x is itself a box, ready to be used by the caller any number of times.

I would highlight that this is what's going to be happening at the time you execute the program, as opposed to compiling it.

Epilogue

That's the story, to the best of my knowledge. For further clarifications, I refer the reader to the works of Simon Peyton Jones: the book on the design of Haskell and the article on the design of Graph machine, together describing all the inner workings of GHC to the smallest peculiarity.

To review the Core generated by GHC, simply pass the flag -ddump-simpl as you compile something. It will hurt your eyes at first, but one gets used to it.

Enjoy!

postscriptum

As @DanielWagner pointed out in the comments, the laziness of Haskell has some further consequences that we should have needed to consider were we to dissect a less contrived case. Specifically: a computation may not need to evaluate some of the boxes it points to, or even any of those boxes at all. In such case, these boxes will stay untouched and unevaluated, while the computation completes and delivers its result that is in actuality independent of the subordinate boxes anyway. An example of such a function: f x = 3. This has far-reaching consequences: say, if x were impossible to compute, as in "infinite loop", a function that does not use x in the first place would not enter that loop at all. Thus, it is sometimes desirable to know in detail which sub-computations will necessarily be launched from a given computation and which may not. Such intricacies reach a bit farther than I'm prepared to describe in this answer, so at this cautionary note I will end.

N.B. this description of evaluation depends critically (and silently) on the fact that `(+) :: Int -> Int -> Int` is strict. With a lazier operation, there may be significantly more possible orders for the various steps outlined in the "Execution" section, and indeed some of the steps might never happen at all. — Daniel Wagner, Jan 22 '18 at 18:54
@DanielWagner Could you kindly elaborate? I will gladly integrate your corrections as a *postscriptum*, with attribution. — Ignat Insarov, Jan 22 '18 at 18:55
As far as I can tell, there is no actual *correction* needed: everything stated here is correct. It is merely slightly misleading, in that it bakes in some assumptions about which functions demand which values. For example, the fact that `bat x` is the first thing computed is true, but only because `(+)` forces at least one of its arguments, and both of its arguments force `bat x` eventually. e.g. compare the term `let f _ _ = () in f (f (bar j) j) k`, which has the exact same dependency graph as `+ (+ (bar j) j) k`, but in which `bat x` need never be evaluated at all. — Daniel Wagner, Jan 22 '18 at 19:00
In other words, a complete explanation (which I understand this answer isn't really trying to be) would need to discuss how one chooses which node in the computation graph to reduce first; in this example, "the deepest leaf" is the right answer, but that's only coincidence. — Daniel Wagner, Jan 22 '18 at 19:06
@DanielWagner To be honest, I don't even know how to begin explaining this. It's more than a bit beyond my level of knowledge. I added a note that, hopefully, relays some of your points. Please edit the answer further if you find it desirable, as I really cannot proceed further with the subject of evaluation order. — Ignat Insarov, Jan 22 '18 at 19:19
I’m not convinced your explanation is correct at any thx optimization level. With absolutely no optimization, `bat x` is not computed in step 2, instead a thunk is created. With even a little bit of strictness analysis different things will happen. With a lot of optimization it will turn into the same code as if you’d written it in C. — augustss, Jan 24 '18 at 12:03

score 5 · Answer 2 · answered Jan 22 '18 at 17:07

The order of evaluation is not specified (in the Haskell report) for addition. As a result the evaluation order depends on the type of your number and it's Num instance.

For example, below are two types with Num instances and reversed order of evaluation. I have used a custom Show instance and debug print outs to make the point easier to see in the output.

import Debug.Trace

newtype LeftFirst = LF { unLF :: Integer }
instance Show LeftFirst where show (LF x) = x `seq` "LF"++show x
newtype RightFirst = RF { unRF :: Integer }
instance Show RightFirst where show (RF x) = x `seq` "RF"++show x

instance Num LeftFirst where
    (+) a b = a `seq` LF (unLF a + unLF b)
    fromInteger x = trace ("LF" ++ show x) (LF x)

instance Num RightFirst where
    (+) a b = b `seq` RF (unRF a + unRF b)
    fromInteger x = trace ("RF" ++ show x) (RF x)

func :: Num a => a -> a
func x = foo i j k
  where
    foo i j k = i + j + k
    k = bat x
    j = baz k
    i = bar j

bar,baz,bat :: Num a => a -> a
bar = (+1)
baz = (+2)
bat = (+3)

And notice the output:

*Main> func (0 :: LeftFirst)
LF0
LF3
LF2
LF1
LF14
*Main> func (0 :: RightFirst)
RF3
RF0
RF2
RF1
RF14

Fried Brice · Answer 3 · 2022-04-05T22:12:12.630

4

First off, foo i j k will parse as ((foo i) j) k. This is because all functions in Haskell take exactly one argument. The one arg of foo is i, then the result (foo i) is a function whose one arg is j, etc. So, it's neither foo(i(j(k))) nor foo (i, j, k); however, I should warn you that ((foo i) j) k ends up being in some sense equivalent to foo (i, j, k) for reasons that we can go into if you'd like.

Second, i, j, and k will be passed to foo not as reduced values but as expressions, and it's up to foo to decide (via foo's formula) how and when to evaluate each of the supplied expressions. In the case of (+), I'm pretty sure it's simply left-to-right. So, i will be forced first, but of course to evaluate i, all the others will need to be evaluated, so you trace out the data dependency tree to its leaves, which bottoms out at x.

Perhaps the subtlety here is that there is a distinction between "reduced" and "fully reduced." i will be reduced first, in the sense that one layer of abstraction--the namei--is removed and replaced with the formula for i, but it's nor fully reduced at that point, and to fully reduce i we need to fully reduce its data dependencies.

edited Apr 05 '22 at 22:12

answered Jan 22 '18 at 15:41

Fried Brice

769
7
20

"and it's up to foo to decide (via foo's formula) how and when to evaluate each of the supplied expressions" No it's not. You can force some amount of sequentiality using constructs like `seq`, but generally it's not up to any particular function but up to the RTS to determine if, when and how anything is evaluated. – Cubic Jan 22 '18 at 15:54
I should say "assuming you print the result" or something, which is the assumption of the question. – Fried Brice Jan 22 '18 at 15:58
Assuming the result is called for, `foo`'s formula will be used to decide which arguments get evaluated and in what order. – Fried Brice Jan 22 '18 at 16:00
1

@FriedBrice `foo`'s formula does not unique define a reduction order. The compiler is free to generate code that evaluates `i`, `j`, and `k` in any order. Remember that parse tree is not evaluation order. – Carl Jan 22 '18 at 17:07

score 3 · Answer 4 · answered Jan 22 '18 at 23:47

If I understand your question (and follow-up comments) correctly, I guess you aren't really interested in the "order of evaluation" or the details of how a particular Haskell compiler actually performs the evaluation. Instead, you're simply interested in understanding what the following program means (i.e., its "semantics"):

func x = foo i j k
  where
    foo i j k = i + j + k
    k = bat x
    j = baz k
    i = bar j

so that you can predict the value of, say, func 10. Right?

If so, then what you need to understand is:

how names are scoped (e.g., so that you understand that the x in the definition of k refers to the parameter x in the definition of func x and so on)
the concept of "referential transparency", which is basically the property of Haskell programs that a variable can be replaced with its definition without affecting the meaning of the program.

With respect to variable scoping when a where clause is involved, it's useful to understand that a where clause is attached to a particular "binding" -- here, the where clause is attached to the binding for func x. The where clause simultaneously does three things:

First, it pulls into its own scope the name of the thing that's being defined in the associated binding (here func) and the names of any parameters (here x). Any reference to func or x within the where clause will refer to the func and x in the func x binding that's being defined (assuming that the where clause doesn't itself define new binding for func or x that "shadow" those binding -- that's not an issue here). In your example, the implication is that the x in the definition k = bat x refers to the parameter x in the binding for func x.

Second, it introduces into its own scope the names of all the things being defined by the where clause (here, foo, k, j, and i), though not the parameters. That is, the i, j, and k in the binding foo i j k are NOT introduced into scope, and if you compile your program with the -Wall flag, you'll get a warning about shadowed bindings. Because of this, your program is actually equivalent to:

func x = foo i j k
  where
    foo i' j' k' = i' + j' + k'
    k = bat x
    j = baz k
    i = bar j

and we'll use this version in what follows. The implication of the above is that the k in j = baz k refers to the k defined by k = bat x, while the j in i = bar j refers to the j defined by j = baz k, but the i, j, and k defined by the where clause have nothing to do with the i', j', and k' parameters in the binding foo i' j' k'. Also note that the order of bindings doesn't matter. You could have written:

func x = foo i j k
  where
    foo i' j' k' = i' + j' + k'
    i = bar j
    j = baz k
    k = bat x

and it would have meant exactly the same thing. Even though i = bar j is defined before the binding for j is given, that makes no difference -- it's still the same j.

Third, the where clause also introduces into the scope of the right-hand side of the associated binding the names discussed in the previous paragraph. For your example, the names foo, k, j, and i are introduced into the scope of the expression on the right hand side of the associated binding func x = foo i j k. (Again, there's a subtlety if any shadowing is involved -- a binding in the where clause would override the bindings of func and x introduced on the left-hand side and also generate a warning if compiled with -Wall. Fortunately, your example doesn't have this problem.)

The upshot of all this scoping is that, in the program:

func x = foo i j k
  where
    foo i' j' k' = i' + j' + k'
    k = bat x
    j = baz k
    i = bar j

every usage of each name refers to the same thing (e.g., all the k names refer to the same thing).

Now, the referential transparency rule comes into play. You can determine the meaning of an expression by substituting any name by its definition (taking care to avoid name collisions or so-called "capture" of names). Therefore, if we were evaluating func 10, it would be equivalent to:

func 10                                        -- binds x to 10
= foo i j k                                    -- by defn of func

at this stage, the definition of foo is used which binds i' to i, j' to j, and k' to k in order to produce the expression:

= i + j + k                                    -- by defn of foo
= bar j + baz k + bat x                        -- by defs of i, j, k
= bar (baz k) + baz k + bat x                  -- by defn of j
= bar (baz (bat x)) + baz (bat x) + bat x      -- by defn of k
= bar (baz (bat 10)) + baz (bat 10) + bat 10   -- by defn of x

So, if we defined:

bat = negate
baz y = 7 + y
bar z = 2*z

then we'd expect:

func 10 = 2 * (7 + negate 10) + (7 + negate 10) + negate 10
        = -19

which is exactly what we get:

> func 10
-19