20

I define two mutually recursive lists for even and odd numbers in ghci as follows:

> let evens = 0:map (+1) odds; odds = map (+1) evens

And then I consult the thunks using :sp

> :sp evens
evens = _
> :sp odds
odds = _
> take 5 evens
[0,2,4,6,8]
> :sp evens
evens = 0 : 2 : 4 : 6 : 8 : _
:sp odds
odds = _

Notice how the odds thunk is not evaluated although evens has been evaluated to the 5th element. I can think of an intuitive explanation for this. odds has to be explicitly invoked to be evaluated:

> take 5 odds
[1,3,5,7,9]
>:sp odds
odds = 1 : 3 : 5 : 7 : 9 : _

However, now when I do this:

> take 10 evens
[0,2,4,6,8,10,12,14,16,18]
> :sp evens
evens = 0 : 2 : 4 : 6 : 8 : 10 : 12 : 14 : 16 : 18 : _
> :sp odds
odds = 1 : 3 : 5 : 7 : 9 : 11 : 13 : 15 : 17 : _

Notice how the odds thunk is now evaluated whenever evens is evaluated? Why was odds not evaluated the first time and evaluated the second time and in all subsequent evaluations? What's happening?

is7s
  • 3,500
  • 1
  • 20
  • 41
  • 1
    couldn't reproduce that. what's your OS, ghc version etc? – jev Mar 18 '14 at 22:05
  • I can reproduce it on Win7 with GHC 7.6.3. I can also do `take 5 odds` and get the same behavior for `:sp evens`, which really doesn't make sense because `evens` has the literal `0` in it. – bheklilr Mar 18 '14 at 22:12
  • 7
    I have a guess: `evens` and `odds` are "sort of" polymorphic until you observe them explicitly the first time. Then they're magically monomorphed into `[Integer]` and from then on everything is visible. e.g. try adding `evens, odds :: [Integer]` to your `let` to see the difference. – Daniel Wagner Mar 18 '14 at 22:48
  • @bheklilr When a literal "looks like a function" it doesn't get early evaluated. Compare `let x = [1,2,3,4]` and `let y () = [1,2,3,4]`. If we do `:sp x` we get `x = [1,2,3,4]` but `:sp y ()` is `y = _`. So in this case the literal `0` in `evens` won't get early evaluated since it's not obviously a constant. – J. Abrahamson Mar 19 '14 at 00:55
  • 1
    @DanielWagner The default type signature for `evens` and `odds` is `[Integer]`, at least till ghc 7.6.3. This is the type inferred without having to explicitly mention it. Why should mentioning the type make any difference? I think this just means that there is a bug in either the type infrencer or `:sp` itself. – is7s Mar 19 '14 at 06:10

1 Answers1

12

This has to do with how mutually recursive bindings are compiled by GHC (and there's a difference whether the bindings have an explicit type signature or not).

Let's write the following simple program which exposes the same problem but removes all suspicion on the role that integer overloading or the monomorphism restriction could play:

module MutRec where

ft = False : map not tf
tf = map not ft

Loading this into GHCi (I'm using 7.6.3) yields:

*MutRec> take 5 ft
[False,False,False,False,False]
*MutRec> :sp ft
ft = False : False : False : False : False : _
*MutRec> :sp tf
tf = _

Let's look at the Core code for this module

$ ghc -O0 MutRec -fforce-recomp -ddump-simpl -dsuppress-all
[1 of 1] Compiling MutRec           ( MutRec.hs, MutRec.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 28, types: 42, coercions: 0}

Rec {
ft1_rkA
ft1_rkA = : False a_rkC

tf1_rkB
tf1_rkB = map not ft1_rkA

a_rkC
a_rkC = map not tf1_rkB
end Rec }

ds_rkD
ds_rkD = (ft1_rkA, tf1_rkB)

ft
ft = case ds_rkD of _ { (ft2_Xkp, tf2_Xkr) -> ft2_Xkp }

tf
tf = case ds_rkD of _ { (ft2_Xkq, tf2_Xks) -> tf2_Xks }

This explains it all. The mutually recursive definitions end up in a Rec block, referring each other directly. But then GHC is building a pair ds_rkD and re-extracts the components from the pair. This is an extra indirection. It explains why after partially evaluating ft in GHCi, the top of tf will still appear as a thunk, even if underneath there has been evaluation. In fact, we can verify that just doing minimal evaluation on tf is enough to expose this:

*MutRec> take 5 ft
[False,False,False,False,False]
*MutRec> :sp ft
ft = False : False : False : False : False : _
*MutRec> :sp tf
tf = _
Prelude MutRec> seq tf ()
()
Prelude MutRec> :sp tf
tf = True : True : True : True : _

If we add explicit type sigantures to ft and tf or enable optimization, the tuple construction does not happen:

$ ghc -O MutRec -fforce-recomp -ddump-simpl -dsuppress-all
[1 of 1] Compiling MutRec           ( MutRec.hs, MutRec.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 12, types: 11, coercions: 0}

Rec {
ft1
ft1 = map not tf

ft
ft = : False ft1

tf
tf = map not ft
end Rec }

Now GHCi will behave more naturally.


Edit

I've looked at the GHC sources to try to figure out the reason for the difference in behaviour. It seems it's a side effect of how type inference works for polymorphic bindings.

If a binding is polymorphic but doesn't have a type signature, then it's recursive uses are monomorphic. This is a restriction in Hindley-Milner that GHC also implements. If you want polymorphic recursion, you need an additional type signature.

To model this faithfully in the Core language, the desugarer makes a monomorphic copy of every unannotated recursive function. This monomorphic version is used in the recursive calls, the generalized version is used for external calls. You can see this even for a small function such as rep (which is a reimplementation of repeat). The desugared core for

rep x = x : rep x

is

rep
rep =
  \ (@ a_aeM) ->
    letrec {
      rep_aeJ
      rep_aeJ =
        \ (x_aeH :: a_aeM) -> : @ a_aeM x_aeH (rep_aeJ x_aeH); } in
    rep_aeJ

The outer rep is polymorphic, hence the type abstraction \ (@ a_aeM) -> in the beginning. The inner rep_aeJ is monomorphic and used in the recursive call.

If you add an explicit type annotation to rep

rep :: a -> [a]
rep x = x : rep x

then the recursive calls are to the polymorphic version, and the generated Core becomes simpler:

Rec {
rep
rep = \ (@ a_b) (x_aeH :: a_b) -> : @ a_b x_aeH (rep @ a_b x_aeH)
end Rec }

You can see how the type argument @ a_b is picked up in the beginning and reapplied to rep in every recursive call.

The tuple construction we're seeing for mutually recursive bindings is simply a generalization of this principle. You build up inner monomorphic versions of the mutually recursive functions, then generalize them in a tuple, and extract the polymorphic versions from the tuple.

All this happens independently of whether the bindings are actually polymorphic or not. It's sufficient for them to be recursive. I think this behaviour of GHC is completely correct and ok, in particular because optimisation takes care of the performance hit.

kosmikus
  • 19,549
  • 3
  • 51
  • 66
  • But the fact that the behavior changes after adding an explicit type signature is unjustified, isn't it? – is7s Mar 19 '14 at 11:19
  • 4
    @is7s *Does* the behavior change? The result doesn't. GHC is "allowed" to use any evaluation strategy it likes that doesn't change the result. `:sp` is peeking into implementation details that aren't normally "exposed". I too would be interested in a note on what the tuple construction achieves for GHC, though. – Ben Mar 19 '14 at 11:53
  • 1
    @Ben But it is an extra level of indirection, thus a performance hit. I would never think that adding the type which was already inferred correctly could make any difference in terms of performance! – is7s Mar 19 '14 at 12:15
  • 1
    @is7s I've edited my answer. I agree with Ben that GHC is allowed to do what it does. It doesn't make sens to complain about a potential performance hit in an unoptimized Haskell program. It would only be a problem if optimizations wouldn't take care of it. – kosmikus Mar 19 '14 at 12:59
  • `If a binding is polymorphic but doesn't have a type signature` ... but in our case the binding is actually monomorphic. The inferred type is `[Integer]`. – is7s Mar 19 '14 at 13:28
  • @is7s Quoting from my own answer: "All this happens independently of whether the bindings are actually polymorphic or not. It's sufficient for them to be recursive." I guess this could be changed, but perhaps it doesn't seem worth the special case. – kosmikus Mar 19 '14 at 13:45