Consider a simple function from the recent question:
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"
We can ask GHC to give us the simplifier output with ghc -ddump-simpl
. (Possibly with some
additional flags like -dsuppress-module-prefixes
-dsuppress-uniques
.) As I understand, it
is the last stage of compilation where the result still has any semblance to the original high
level code. So here is what it says:
-- RHS size: {terms: 21, types: 22, coercions: 0, joins: 0/0}
myButLast :: forall a. [a] -> a
[GblId,
Arity=1,
Str=<S,1*U>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 100 0}]
myButLast
= \ (@ a) (ds :: [a]) ->
case ds of {
[] -> myButLast1 @ a;
: x ds1 ->
case ds1 of {
[] -> myButLast1 @ a;
: y ds2 ->
case ds2 of {
[] -> x;
: ipv ipv1 -> myButLast_$smyButLast1 @ a y ipv ipv1
}
}
}
What is happening here? Let us see.
To the type signature, now with an explicit quantifier, some sort of annotations are attached. I may guess that they say "global identifier, unary, top level", which is all true for this function. The other annotations, like
WorkFree=True
,Str=<S,1*U>
, are to me cryptic.The "value" definition is now a lambda that accepts, in addition to a list, a type variable argument, and proceeds to study the list by case analysis.
[] -> myButLast1 @ a
is a glorified error call, so let us ignore it for now. The interesting part is the call tomyButLast_$smyButLast1
(What kind of name is that? I thought$
sign could not be a part of an identifier.), which turns out to be a tail recursive function that actually traverses the list.And here it is, a single member of what we recognize as a mutually recursive block:
Rec { -- RHS size: {terms: 13, types: 12, coercions: 0, joins: 0/0} myButLast_$smyButLast1 [Occ=LoopBreaker] :: forall a. a -> a -> [a] -> a [GblId, Arity=3, Caf=NoCafRefs, Str=<L,1*U><L,1*U><S,1*U>, Unf=OtherCon []] myButLast_$smyButLast1 = \ (@ a) (sc :: a) (sc1 :: a) (sc2 :: [a]) -> case sc2 of { [] -> sc; : ipv ipv1 -> myButLast_$smyButLast1 @ a sc1 ipv ipv1 } end Rec }
It is quite lucid, but it does have some features new to us, like the recursive block delimiter
Rec ... end Rec
and a cryptic remark[Occ=LoopBreaker]
. The annotations are also different: theUnf
array is empty, and aCaf
field appears instead. I can only infer that theUnf
field being interesting is a quality of names defined by the programmer, whilemyButLast_$smyButLast1
is created by the compiler.
So, I can understand about half of what the simplifier gives me, by line count, but of some parts I cannot even begin to guess the meaning.
Is the premise, that the simplifier output is usually the most useful intermediate representation, correct?
Is my reading correct so far?
Is there a manual to all these cryptic remarks? What do they mean?