2

The following code computes numerical trees1 for quantifiers of type Quant, which is similar to the type of functions all and any:

treeOfNumbers :: [(Integer, Integer)]
treeOfNumbers =
  [0..] >>= \ row ->
  let 
    inc = [0 .. row]
    dec = reverse inc
  in
  zip dec inc

type Quant = (Integer -> Bool) -> [Integer] -> Bool

check :: Quant -> (Integer, Integer) -> Bool
check q (m,n) =
  q (\ d -> d - m > 0) [1 .. domN]
  where
    domN = m + n

genTree :: Quant -> [(Integer, Integer)]
genTree q =
  filter (check q) treeOfNumbers

For example, the value for take 10 $ genTree all is

[(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)]

This code, however, appears to cause memory leaks. With ghci’s heap limited at 100M, genTree all is interrupted around (0,1600). With the limit at 500M, it stops around (0,3950), after having become very slow.

How can this be improved? I have limited experience with Haskell, and can only guess that perhaps my implementation of treeOfNumbers is the culprit; check works for large values without any problem.


1See Computational Semantics with Functional Programming (Jan van Eijck and Christina Unger), Cambridge University Press, 2010, pp. 157–159.

Pablo
  • 819
  • 8
  • 15
  • 1
    Have you tried running it with [profiling enabled](https://www.haskell.org/ghc/docs/7.8.1/html/users_guide/profiling.html)? You could see the cost centers and get a better idea for what is causing the memory problem. – bheklilr Sep 17 '14 at 15:54

2 Answers2

6

There's no memory leak here. You're explicitly telling it to hold on to the entire list of tuples by defining it as a top-level constant. Laziness means it won't generate a value until it's needed, but that doesn't help with retaining values. treeOfNumbers won't be garbage collected until the garbage collector can prove it'll never be used again. And some rough math suggests that by the time (0,1600) appears in the list, it will be holding on to approximately 1,280,000 tuples. That's going to eat a lot of memory.

Carl
  • 26,500
  • 4
  • 65
  • 86
  • 1
    +1. Also, [ghci retains any top-level](http://stackoverflow.com/questions/24986296/io-monadic-assign-operator-causing-ghci-to-explode-for-infinite-list/24990602#24990602). You won't see the memory leak in a compiled program (apart from much GC time). – Zeta Sep 17 '14 at 16:19
  • all the internal names add to the problem too. in my answer I saw that coding without any internal named entities, with only the top-level `treeOfNumbers`, it gets to the 3000 in 35M only. – Will Ness Sep 17 '14 at 17:02
3

Don't name your interim entities unless you absolutely have to.

treeOfNumbers :: [(Integer, Integer)]
treeOfNumbers = [0..] >>= \ row -> zip [row, row-1 .. 0] [0 .. row]
  = [p | row <- [0..], p <- zip [row, row-1 .. 0] [0 .. row]]

type Quant = (Integer -> Bool) -> [Integer] -> Bool

check :: Quant -> (Integer, Integer) -> Bool
check q (m,n) = q (> m) [1 .. m + n] 

genTree :: Quant -> [(Integer, Integer)]
genTree q = filter (check q) treeOfNumbers
  = [ p | p <- [p | row <- [0..], p <- zip [row, row-1 .. 0] [0 .. row]]
        , check q p]
  = [ p | row <- [0..], p <- zip [row, row-1 .. 0] [0 .. row]
        , check q p]
  = [ (m,n) | row <- [0..], (m,n) <- zip [row, row-1 .. 0] [0 .. row]
            , q (> m) [1 .. m + n]]

Now genTree all runs inside GHCi in near constant memory, though it is slowing down somewhat. But it got to the 3000 easily in a few seconds.

Using the top-level treeOfNumbers, running filter (check all) treeOfNumbers gets to the 3000 in 35M of memory, and about twice slower than the above.

So it's not only about the top level name; all the interim names in your definitions cause the data retention too.

Also, do compile it with -O2 flag and run it as a standalone executable for any sizable work.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • 2
    It's not naming that's the issue, specifically. Using `reverse` is a significant issue that can be trivially avoided. Using two comprehensions allows them both to stream. Using `reverse` forces both lists into memory during the `zip`. – Carl Sep 17 '14 at 19:21
  • using `reverse` without the naming, `zip (reverse [0..row]) [0..row]`, would force only one of them into memory. – Will Ness Sep 18 '14 at 08:47
  • Yes, `reverse` is always bad for memory use. The problem is `reverse`, not named subexpressions. – Carl Sep 18 '14 at 15:55