21

How should one reason about function evaluation in examples like the following in Haskell:

let f x = ...
    x = ...
in map (g (f x)) xs

In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?

Update

In the following example (f x) will be evaluated 4 times:

let f x = trace "!" $ zip x x
    x = "abc"
in map (\i -> lookup i (f x)) "abcd" 
Grzegorz Chrupała
  • 3,053
  • 17
  • 24
  • Do you have an example where `f x` is being evaluated more than once? – hammar Feb 24 '12 at 23:38
  • @hammar: I added such an example. – Grzegorz Chrupała Feb 25 '12 at 00:00
  • 1
    @Grzegorz You should mention that that only holds if you don't optimise. If you allow higher rank types, I can give you an example where optimising can't eliminate repeated evaluation. Interested? – Daniel Fischer Feb 25 '12 at 00:11
  • It's pretty easy to get the ghci to do it, try this http://hpaste.org/64300 -- in the more complicated case it doesn't make the inference, though it does with compilation. (I see Gzegorz added an example while I composed this.) – applicative Feb 25 '12 at 00:16

4 Answers4

9

With language extensions, we can create situations where f x must be evaluated repeatedly:

{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where

data BI where
    B :: (Bounded b, Integral b) => b -> BI

foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
             f x = maxBound - x
             g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
             g m (B y) = toInteger (m + y)
             x :: (Integral i) => i
             x = 3
         in map (g (f x)) xs

The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).

A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).

  • Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
  • Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.

Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
8

Your examples are indeed quite different.

In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function. Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.

Hence, in your first example, f xwill be evaluated at most 1 time.

Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.

Ingo
  • 36,037
  • 5
  • 53
  • 100
  • 1
    Good point, I oversimplified the initial example. Re trace: if `f x` is expensive, it's easy to see that it's being re-evaluated also without using `trace`. – Grzegorz Chrupała Feb 25 '12 at 00:53
  • 1
    Yes. But the point is that the second example requires some code transformations (i.e. `let xxx = f x in map (\i -> lookup i xxx) "abcd"`) to factor out expensive constants like f x, while in the first example, even the dumbest compiler without any optimization whatever will generate code that leads to the result described (because the non-strict thunk evaluation and update happens in the RTS anyway). – Ingo Feb 25 '12 at 01:04
6

This is really dependent on GHC's optimizations, as you've been able to tell.

The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.

If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.

All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
6

In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called. (GHC may optimize this expression as discussed in the FAQ [1].)

let f x = trace "!" $ zip x x
    x = "abc"
in map (\i -> lookup i (f x)) "abcd" 

However, if we move f x out of the function, it will execute only once.

let f x = trace "!" $ zip x x
    x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd" 

This can be rewritten more readably as

let f x = trace "!" $ zip x x
    x = "abc"
    g f_x i = lookup i f_x
in map (g (f x)) "abcd" 

The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.

[1] http://www.haskell.org/haskellwiki/GHC/FAQ#Subexpression_Elimination

Heatsink
  • 7,721
  • 1
  • 25
  • 36