Haskell; performance of where clause

Question

I was analyzing the effect of where clauses on performance of Haskell programs.

In Haskell, The craft of functional programming, Thomspson, chapter 20.4, I found the following example:

exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]

exam2 :: Int -> [Int]
exam2 n = list ++ list
  where list = [1 .. n]

and, I quote,

The time taken to calculate [exam1] will be O(n), and the space used will be O(1), but we will have to calculate the expression [1 .. n] twice.

...

The effect [of exam2] is to compute the list [1 .. n] once, so that we save its value after calculating it in order to be able to use it again.

...

If we save something by referring to it in a where clause, we have to pay the penalty of the space that it occupies.

So I go wild and think that the -O2 flag must handle this and choose the best behavior for me. I analyze the time-complexity of these two examples using Criterion.

import Criterion.Main

exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]

exam2 :: Int -> [Int]
exam2 n = list ++ list
  where list = [1 .. n]

m :: Int
m = 1000000

main :: IO ()
main = defaultMain [ bench "exam1" $ nf exam1 m
                   , bench "exam2" $ nf exam2 m
                   ]

I compile with -O2, and find:

benchmarking exam1
time                 15.11 ms   (15.03 ms .. 15.16 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 15.11 ms   (15.08 ms .. 15.14 ms)
std dev              83.20 μs   (53.18 μs .. 122.6 μs)

benchmarking exam2
time                 76.27 ms   (72.84 ms .. 82.75 ms)
                     0.987 R²   (0.963 R² .. 0.997 R²)
mean                 74.79 ms   (70.20 ms .. 77.70 ms)
std dev              6.204 ms   (3.871 ms .. 9.233 ms)
variance introduced by outliers: 26% (moderately inflated)

What a difference! Why would that be? I thought that exam2 should be faster but memory inefficient (according to the quote above). But no, it is actually much slower (and probably more memory inefficient but that needs to be tested).

Maybe it is slower because [1 .. 1e6] has to be stored in memory, and this takes a lot of time. What do you think?

PS: I found a possibly related question, but not really.

It looks like in second example compiler failed to inline `list` so it actually calculate it, store in memory, and read it from memory. — talex, Apr 29 '19 at 12:07
@talex Inlining `let x = expensiveComputation in f x x` can be harmful in the general case, since it can cause `x` to be computed twice. Using `where` we can have a similar issue. I think GHC is pretty conservative with inlining in this case, since it can cause performance disasters (e.g. doing two recursive calls instead of one can cause an exponential blowup). — chi, Apr 29 '19 at 12:40
@chi yes, but in this case inlining lead to no computation at all. It will be computed on the fly. — talex, Apr 29 '19 at 13:29
@talex I agree, but I don't expect GHC to understand that in this case, it can safely inline. Detecting when one can and when one can not inline looks non trivial. — chi, Apr 29 '19 at 13:36

score 6 · Accepted Answer · answered Apr 29 '19 at 13:02

You can inspect GHC Core using -ddump-simpl and observe the optimized code GHC produced. Core is not as readable as Haskell, but usually one can still get the idea of what is going on.

For exam2 we get plain boring code:

exam2
  = \ (n_aX5 :: Int) ->
      case n_aX5 of { GHC.Types.I# y_a1lJ ->
      let {
        list_s1nF [Dmd=<S,U>] :: [Int]
        [LclId]
        list_s1nF = GHC.Enum.eftInt 1# y_a1lJ } in
      ++ @ Int list_s1nF list_s1nF
      }

Roughly, this defines list_s1nF as [1..n] (eftInt = enum from to) and calls ++. No inlining happened here. GHC was afraid to inline list_s1nF since it is used twice, and inlining a definition in such case can be harmful. Indeed if let x = expensive in x+x is inlined, expensive might get recomputed twice, which is bad. Here GHC trusts the programmer, thinking that if they used a let / where they want that to be computed only once. Failing to inline list_s1nF prevents further optimization.

So this code allocates list = [1..n], and then copies that resulting in 1:2:...:n:list where the tail pointer is made to point to the original list. Copying an arbitrary list requires to follow a pointer chain and allocating cells for the new list, which is intuitively more expensive than [1..n] which only needs to allocate the cells for the new list and keep a counter around.

Instead, exam1 is optimized much further: after some minor unboxing

exam1
  = \ (w_s1os :: Int) ->
      case w_s1os of { GHC.Types.I# ww1_s1ov ->
      PerfList.$wexam1 ww1_s1ov
      }

we get to the actual worker function.

PerfList.$wexam1
  = \ (ww_s1ov :: GHC.Prim.Int#) ->
      let {
        n_a1lT :: [Int]
        [LclId]
        n_a1lT = GHC.Enum.eftInt 1# ww_s1ov } in
      case GHC.Prim.># 1# ww_s1ov of {
        __DEFAULT ->
          letrec {
            go_a1lX [Occ=LoopBreaker] :: GHC.Prim.Int# -> [Int]
            [LclId, Arity=1, Str=<L,U>, Unf=OtherCon []]
            go_a1lX
              = \ (x_a1lY :: GHC.Prim.Int#) ->
                  GHC.Types.:
                    @ Int
                    (GHC.Types.I# x_a1lY)
                    (case GHC.Prim.==# x_a1lY ww_s1ov of {
                       __DEFAULT -> go_a1lX (GHC.Prim.+# x_a1lY 1#);
                       1# -> n_a1lT
                     }); } in
          go_a1lX 1#;
        1# -> n_a1lT
      }

Here, the first "enum from to" [1..n] was inlined, and that also triggered the inlining of ++. The resulting recursive function go_a1lX only relies of : and basic arithmetics. When the recursion is over, n_a1lT is returned which is the second "enum from to" [1..n]. This is not inlined, since it would trigger no more optimization.

Here, no list is generated and then copied, so we get better performance.

Note that this also produces optimized code:

exam3 :: Int -> [Int]
exam3 n = list1 ++ list2
  where list1 = [1 .. n]
        list2 = [1 .. n]

as well as this, since GHC won't automatically cache the result of functions, so those can be inlined.

exam4 :: Int -> [Int]
exam4 n = list () ++ list ()
  where list () = [1 .. n]

Thank you! What is your opinion with respect to using let/where clauses? I often use them in situations where giving names to sub-expressions is useful for readability and to reduce code redundance; and in these cases I do not think about performance at all (actually, I thought that using where in exam2 would result in a speed benefit). Or, is there a way to write more readable and less redundant code (like exam2) that runs as fast as exam1? — Dominik Schrempf, Apr 30 '19 at 05:00
@DominikSchrempf I use let/where to structure code for readability first. Then, if performance is a concern, I can make some adjustments. If `let x = ...` is only used once, there is no difference, and the compiler will probably inline it. If it is used twice or more, one has to ask themselves "do I want this information to be stored and reused? or is cheaper to recompute it?". Usually, if `x` has a fixed memory size(e.g. it's a single int), it's best to reuse it. If instead it is a list/tree/whatever, it depends. In your code I might go with `[1..n]++[1..n]`, `exam3` or `exam4` above. — chi, Apr 30 '19 at 07:35
@DominikSchrempf Haskell performance is admittedly a bit of a black art. It is not as obvious to see what is going on since the optimizer might either do a great job or fail to do that unexpectedly. Laziness makes it harder to estimate performance, too. Still, with some experience, one quickly learns the most common performance pitfalls, and how to avoid them. E.g. it is known that if you use `++` a lot, passing large lists as a first argument, especially recursively, it is probably better to use "difference lists" instead of plain lists. — chi, Apr 30 '19 at 07:40

Haskell; performance of where clause

1 Answers1