5

If so, is this a part of the standard or a ghc specific optimisation we can depend on? Or just an optimisation which we can't necessarily depend on.

P.S.: When I tried a test sample, it seemed to indicate that it was taking place/

Prelude> let isOdd x = x `mod` 2 == 1
Prelude> let isEven x = x `mod` 2 == 0
Prelude> ((filter isOdd).(filter isEven)) [1..]

Chews up CPU but doesn't consume much memory.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
Roman A. Taycher
  • 18,619
  • 19
  • 86
  • 141

2 Answers2

7

Depends on what you mean by generator. The list is lazily generated, and since nothing else references it, the consumed parts are garbage collected almost immediately. Since the result of the above computation doesn't grow, the entire computation runs in constant space. That is not mandated by the standard, but as it is harder to implement nonstrict semantics with different space behaviour for that example (and lots of vaguely similar), in practice you can rely on it.

But normally, the list is still generated as a list, so there's a lot of garbage produced. Under favourable circumstances, ghc eliminates the list [1 .. ] and produces a non-allocating loop:

result :: [Int]
result = filter odd . filter even $ [1 .. ]

(using the Prelude functions out of laziness), compiled with -O2 generates the core

List.result_go =
  \ (x_ayH :: GHC.Prim.Int#) ->
    case GHC.Prim.remInt# x_ayH 2 of _ {
      __DEFAULT ->
        case x_ayH of wild_Xa {
          __DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
          9223372036854775807 -> GHC.Types.[] @ GHC.Types.Int
        };
      0 ->
        case x_ayH of wild_Xa {
          __DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
          9223372036854775807 -> GHC.Types.[] @ GHC.Types.Int
        }
    }

A plain loop, running from 1 to maxBound :: Int, producing nothing on the way and [] at the end. It's almost smart enough to plain return []. Note that there's only one division by 2, GHC knows that if an Int is even, it can't be odd, so that check has been eliminated, and in no branch a non-empty list is created (i.e., the unreachable branches have been eliminated by the compiler).

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • If you did `filter (\x -> odd x && even x) [1..]` would it be smart enough to somehow turn the lambda into, effectively, `const False`? – Tyler Nov 27 '11 at 22:44
  • 1
    @MatrixFrog No, not yet. It can't turn `(\x -> odd x && even x)` into `const False` in general, because of bottom, it would have to be `\x -> seq x False`(1). In this case, for `Int` and some other known types, the compiler sort of knows that `[1 .. ]` doesn't contain bottoms, so it could, but it doesn't analyse enough to bring the two together (I doubt such a situation occurs often enough that such an analysis would be worthwhile). (1)For some known types. In general, the remainder mod 2 has to be calculated and compared to 0, because those calculations could be nonterminating. – Daniel Fischer Nov 27 '11 at 23:41
  • @MatrixFrog In fact, there are perfectly legitimate instances for which that optimization would be _wrong_, even ignoring bottoms. For example, the `Expr` type provided by the `simple-reflect` package breaks a lot of "laws" that we often assume of `Num` and `Integral` instances -- like this one. – Daniel Wagner Nov 28 '11 at 03:43
  • @DanielWagner Since `odd = not . even`, it can never return `True`, so it's either `False` or nontermination, or am I missing something? – Daniel Fischer Nov 28 '11 at 04:31
  • @DanielFischer Of course you're right. I had thought that `even` and `odd` were both defined in terms of `mod`, but didn't check it -- that'll teach me. – Daniel Wagner Nov 28 '11 at 16:20
2

Strictly speaking, Haskell does not specify any particular evaluation model, so implementations are free to implement the language's semantics how they want. However, in any sane implementation, including GHC, you can rely on this running in constant space.

In GHC, computations like these result in a singly-linked list ending in a thunk representing the remainder of the list which has not yet been evaluated. As you evaluate this list, more of the list will be generated on demand, but since the beginning of the list is not referred to anywhere else, the earlier parts are immediately eligible for garbage collection, so you get constant space behavior.

With optimizations enabled, GHC is very likely to perform deforestation here, optimizing away the need for having a list at all, and the result will be a simple loop with no allocation performed.

hammar
  • 138,522
  • 17
  • 304
  • 385
  • 2
    An allocation-free loop is only possible for some types. With `Integer`, as you'd get without type signature, the 'loop counter' would still have to be a heap-allocated `Integer`. The main point, the elimination of the list, stands. – Daniel Fischer Nov 26 '11 at 12:28
  • Where can I read more about the deforestation process you referred to? – haskelline Nov 26 '11 at 23:18
  • @brence: The deforestation technique currently used by GHC on lists is called [foldr/build-fusion](http://www.haskell.org/haskellwiki/Correctness_of_short_cut_fusion#foldr.2Fbuild_4), while another form called [stream fusion](http://stackoverflow.com/questions/578063/what-is-haskells-stream-fusion) is used by the `vector` package. HaskellWiki also has an [extensive list of papers on various deforestation techniques](http://www.haskell.org/haskellwiki/Research_papers/Compilation#Fusion_and_deforestation). Those should be good places to start. – hammar Nov 27 '11 at 00:14