22

According to this article,

Enumerations don't count as single-constructor types as far as GHC is concerned, so they don't benefit from unpacking when used as strict constructor fields, or strict function arguments. This is a deficiency in GHC, but it can be worked around.

And instead the use of newtypes is recommended. However, I cannot verify this with the following code:

{-# LANGUAGE MagicHash,BangPatterns #-}
{-# OPTIONS_GHC  -O2 -funbox-strict-fields -rtsopts -fllvm -optlc --x86-asm-syntax=intel #-}
module Main(main,f,g)
where       
import GHC.Base  
import Criterion.Main

data D = A | B | C
newtype E = E Int deriving(Eq)

f :: D -> Int#
f z | z `seq` False = 3422#
f z = case z of
  A -> 1234#
  B -> 5678#
  C -> 9012#

g :: E -> Int#
g z | z `seq` False = 7432#
g z = case z of
  (E 0) -> 2345#
  (E 1) -> 6789#
  (E 2) -> 3535#

f' x = I# (f x)
g' x = I# (g x)

main :: IO ()
main = defaultMain [ bench "f" (whnf f' A) 
                   , bench "g" (whnf g' (E 0)) 
                   ]

Looking at the assembly, the tags for each constructor of the enumeration D is actually unpacked and directly hard-coded in the instruction. Furthermore, the function f lacks error-handling code, and more than 10% faster than g. In a more realistic case I have also experienced a slowdown after converting a enumeration to a newtype. Can anyone give me some insight about this? Thanks.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
mnish
  • 1,869
  • 1
  • 13
  • 15

2 Answers2

18

It depends on the use case. For the functions you have, it's expected that the enumeration performs better. Basically, the three constructors of D become Ints resp. Int#s when the strictness analysis allows that, and GHC knows it's statically checked that the argument can only have one of the three values 0#, 1#, 2#, so it needs not insert error handling code for f. For E, the static guarantee of only one of three values being possible isn't given, so it needs to add error handling code for g, that slows things down significantly. If you change the definition of g so that the last case becomes

E _ -> 3535#

the difference vanishes completely or almost completely (I get a 1% - 2% better benchmark for f still, but I haven't done enough testing to be sure whether that's a real difference or an artifact of benchmarking).

But this is not the use case the wiki page is talking about. What it's talking about is unpacking the constructors into other constructors when the type is a component of other data, e.g.

data FooD = FD !D !D !D

data FooE = FE !E !E !E

Then, if compiled with -funbox-strict-fields, the three Int#s can be unpacked into the constructor of FooE, so you'd basically get the equivalent of

struct FooE {
    long x, y, z;
};

while the fields of FooD have the multi-constructor type D and cannot be unpacked into the constructor FD(1), so that would basically give you

struct FooD {
    long *px, *py, *pz;
}

That can obviously have significant impact.

I'm not sure about the case of single-constructor function arguments. That has obvious advantages for types with contained data, like tuples, but I don't see how that would apply to plain enumerations, where you just have a case and splitting off a worker and a wrapper makes no sense (to me).

Anyway, the worker/wrapper transformation isn't so much a single-constructor thing, constructor specialisation can give the same benefit to types with few constructors. (For how many constructors specialisations would be created depends on the value of -fspec-constr-count.)


(1) That might have changed, but I doubt it. I haven't checked it though, so it's possible the page is out of date.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • 1
    Thank you very much! Actually, my real question was isn't enumerations *generally* better than newtypes (of Int)? It seems to me there is no reason why enumerations cannot be unpacked in a strict field of a datatype. Can you *generally* recommend enumerations over newtypes where using enumerations makes the code clearer and safer? – mnish Oct 12 '12 at 21:40
  • 1
    If "generally" means in all cases, then the answer would be no. The performance advantage a newtype wrapper around `Int` can have is one thing (but in principle, it should be possible to unpack plain enumerations into other constructors too, GHC just doesn't/didn't; there's probably a reason for that, complicated?). Another thing would be a type with many values, for an enumeration with two million values, a newtype would be more convenient. But the latter is exceptional, so what remains is the performance question. If that matters, and newtype is faster, then use that, else enumerations. – Daniel Fischer Oct 12 '12 at 21:52
  • 1
    (comment length limit reached) Generally, use enumerations, unless you have a good reason not to. Exceptional cases aside, they're cleaner, safer, and no less convenient. – Daniel Fischer Oct 12 '12 at 21:54
  • 1
    It seems to me ghc(7.4.1) actually does unpack enumerations on strict fields. But I'm not sure whether it's backend specific or not. – mnish Oct 12 '12 at 22:05
  • 2
    No, I'm afraid not. I got ``Ignoring unusable UNPACK pragma on the second argument of `W'`` with `{-# UNPACK #-}` pragmas, and in the core, a strict enumeration field is not unpacked, both with 7.4.2 and 7.6.1. – Daniel Fischer Oct 12 '12 at 22:14
  • 1
    I have another, hopefully fairer benchmark (and the assembly code) to show this. Is it rude to post a related question? Also, I don't quite consider the core as a low-level language. After all, it is a full fledged functional language with tail call optimization... If that is a low level language, then what about emacs-lisp :-) I think it's Haskell that is so high-level. – mnish Oct 12 '12 at 23:35
  • 1
    It's not rude. If you have a related, but sufficiently different question, post it. And no, core isn't very low-level, but it's where most of the optimisations occur. Well, with the native backend, LLVM does quite a few afterwards. – Daniel Fischer Oct 12 '12 at 23:41
  • 1
    Ok, I have written it up. Please take a look if you like. – mnish Oct 13 '12 at 14:11
5

I would guess that GHC has changed quite a bit since that page was last updated in 2008. Also, you're using the LLVM backend, so that's likely to have some effect on performance as well. GHC can (and will, since you've used -O2) strip any error handling code from f, because it knows statically that f is total. The same cannot be said for g. I would guess that it's the LLVM backend that then unpacks the constructor tags in f, because it can easily see that there is nothing else used by the branching condition. I'm not sure of that, though.

Ptharien's Flame
  • 3,246
  • 20
  • 24
  • 1
    Thanks, but changing backends doesn't seem to affect the relative performance. I would really like to know what optimizations ghc is doing here. Can you give me some reference? – mnish Oct 12 '12 at 21:53
  • 1
    Did you look at the core (dump-simpl)? – nponeccop Oct 13 '12 at 11:25
  • 1
    I did, but as Daniel said, there seems to be little indication on the core level. I think the relevant optimization might be happening on a lower level. – mnish Oct 13 '12 at 13:02