Memoized Collatz sequence

Question

I've posted the same question in CodeReview but failed to get an answer. so I am trying my luck here in SO.

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq

genColtzArr n = collatzArr
    where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..] 
          collatz 1 !acc  = 1 + acc
          collatz !m !acc
              | even m    = go (m `div` 2) acc
              | otherwise = go (3 * m + 1) acc
              where go !l !acc
                      | l <= n    = let !v = collatzArr Arr.! l in 1 + acc + v
                      | otherwise = collatz l $ 1 + acc

collatz here means this guy. This function is supposed to receive a number n, and then return an array indexing from 1 to n, and in which each cell contains the length of the link from the index to 1 by applying Collatz formula.

But the memory usage of this method is so high. Here is the profiler result (ghc option -prof -fprof-auto -rtsopts, run time option +RTS -p, n == 500000):

total alloc = 730,636,136 bytes  (excludes profiling overheads)

COST CENTRE              MODULE  %time %alloc

genColtzArr.collatz      Main     40.4   34.7
genColtzArr.collatz.go   Main     25.5   14.4


COST CENTRE                      MODULE                    no.     entries  %time %alloc   %time %alloc     

      genColtzArr                Main                      105           1    0.0    0.0    74.7   72.1
       genColtzArr.collatzArr    Main                      106           1    8.0   20.8    74.7   72.1
        genColtzArr.collatzArr.\ Main                      107      500000    0.9    2.2    66.8   51.3
         genColtzArr.collatz     Main                      109     1182582   40.4   34.7    65.9   49.1
          genColtzArr.collatz.go Main                      110     1182581   25.5   14.4    25.5   14.4

Please note that -O2 is not a desired answer. I want to figure out what's the problem in this program and in general, how should I spot time and memory inefficiencies in Haskell code. Specifically, I have no idea why this code, with tail recursion and bang pattern, can consume so much memory.

UPDATE1:

the same code with -s produces this:

   1,347,869,264 bytes allocated in the heap
     595,901,528 bytes copied during GC
     172,105,056 bytes maximum residency (7 sample(s))
         897,704 bytes maximum slop
             315 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2408 colls,     0 par    0.412s   0.427s     0.0002s    0.0075s
  Gen  1         7 colls,     0 par    0.440s   0.531s     0.0759s    0.1835s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.828s  (  0.816s elapsed)
  GC      time    0.852s  (  0.958s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.004s  (  0.017s elapsed)
  Total   time    1.684s  (  1.791s elapsed)

  %GC     time      50.6%  (53.5% elapsed)

  Alloc rate    1,627,861,429 bytes per MUT second

  Productivity  49.4% of total user, 46.4% of total elapsed

so it takes 300 meg. that is still too large.

Update2

full code

{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq

genColtzArr n = collatzArr
    where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..] 
          collatz 1 !acc  = 1 + acc
          collatz !m !acc
              | even m    = go (m `div` 2) acc
              | otherwise = go (3 * m + 1) acc
              where go !l !acc
                      | l <= n    = let !v = collatzArr Arr.! l in 1 + acc + v
                      | otherwise = collatz l $ 1 + acc


genLongestArr n = Arr.array (1, n) llist
    where colatz = genColtzArr n
          llist  = (1, 1):zipWith (\(n1, a1) l2 -> 
                                    let l1 = colatz Arr.! a1
                                     in (n1 + 1, if l2 < l1 then a1 else n1 + 1)) 
                                  llist (tail $ Arr.elems colatz)


main :: IO ()
main = getLine >> do
    ns <- map read <$> lines <$> getContents
    let m          = maximum ns
    let lar        = genLongestArr m
    let iter []    = return ()
        iter (h:t) = (putStrLn $ show $ lar Arr.! h) >> iter t
    iter ns

"Please note that -O2 is not a desired answer." - are you saying you *aren't* compiling with optimizations? If you expect your Haskell code to be anywhere close to fast, you *must* compile with optimizations. Any time that `-O0` code runs fast is essentially a fluke. The next thing you can do is look at the [optimized core](https://hackage.haskell.org/package/ghc-core). — user2407038, Mar 09 '16 at 13:28
@user2407038 i've shown my ghc argument. I need to understand what my program is going on. it's gonna help for my other programs too. I am not expecting every time I try to improve my program, i will have to come here and ask questions. so for haskell, it's very easy for people to say `try -O2`, no, i want to focus on improving the program itself instead of relying on some optimization. — Jason Hu, Mar 09 '16 at 13:31
I know nothing about the operational semantics of a `-O0` Haskell program - the compiler does whatever it wants, so it is incredibly unpredictable which core passes will fire. Your only recourse is to look at the output of `-v3` and the final generated core. If the code looks stupidly ridiculously bad.. might be because you really do have to use `-O2` with GHC. There's no conspiracy, `-O2` is not somehow "cheating" - your program may have *no* issues with it and the compiler generates a bad program because `-O0` means "do this fast", not "do this well". — user2407038, Mar 09 '16 at 13:37
There are all kinds of space leak in your program: mutual recursion between `collatzArr` and `collatz`, a big long lazy array that cannot be GC'd on fly, etc, but what is your `main`? What's the IO action been evaluated? — zakyggaps, Mar 09 '16 at 14:31
@zakyggaps no it's not leaked. I want to get all link lengths from 1 to n. It's not a secret already I believe, it's project Euler 14.I will need to find the longest link within n. But I think the finding longest won't be too interesting to optimize. — Jason Hu, Mar 09 '16 at 14:58
As long as you don't provide your `main`, it's not possible to reproduce your behaviour in a way that will help others. For example, without changing your code, I get 70% productivity, compared to your less than 50% if I use `main = print $ maximum $ genColtzArr 500000` with 128 MB total memory usage. — Zeta, Mar 09 '16 at 15:44
@Zeta you have my full code now. I hope you can identify something. but till, i think 128MB is too high. it shouldn't take more than 10MB. — Jason Hu, Mar 10 '16 at 13:16

zakyggaps · Answer 1 · 2016-03-12T06:14:49.943

As the other answer on CodeReview hints, it's alright for a 500000-element boxed array to comsume ~20MB memory, however it's not only the array but a lot of things all together:

Although you put bang patterns every where, array initialization itself is a lazy foldr:

-- from GHC.Arr
array (l,u) ies
    = let n = safeRangeSize (l,u)
      in unsafeArray' (l,u) n
                      [(safeIndex (l,u) n i, e) | (i, e) <- ies]

unsafeArray' :: Ix i => (i,i) -> Int -> [(Int, e)] -> Array i e
unsafeArray' (l,u) n@(I# n#) ies = runST (ST $ \s1# ->
    case newArray# n# arrEleBottom s1# of
        (# s2#, marr# #) ->
            foldr (fill marr#) (done l u n marr#) ies s2#)

So unless you evaluated the last bit of an array, it's holding reference to the list used in initialization. Usually the list can be GC'd on fly while you evaluating the array, but in your case the mutual references and self references disturbed the common GC pattern.

llist is self-referencing to produce every single element, so it will not be GC'd until you evaluated the last element of it
it also holds a reference to genColtzArr so genColtzArr won't be GC'd until llist is fully evaluated
you might think collatz is tail recursive but it's not, it's mutual recursive with collatzArr so again both of them won't be GC'd until fully evaluated

Everything combined, your program will keep three 500000-element list-like structures in memory and results ~80MB peak heap size.

Solution

The obvious solution is to force every array / list to normal form before it's used in another so you won't keep multiple copys of the same data in the memory.

genLongestArr :: Int -> Array Int Int
genLongestArr n =
  let collatz = genColtzArr n
  -- deepseq genColtzArr before mapping over it
  -- this is equivalent to your recursive definition
  in collatz `deepseq` (Arr.listArray (1,n) $ fmap fst $ scanl' (maxWith snd) (0, 0) $ Arr.assocs collatz)

maxWith :: Ord a => (b -> a) -> b -> b -> b
maxWith f b b' = case compare (f b) (f b') of
  LT -> b'
  _  -> b

And in main:

-- deepseq lar before mapping over it
-- this is equivalent to your iter loop
lar `deepseq` mapM_ (print . (lar Arr.!)) ns

Nothing can be done with genColtzArr, it's using itself for memorization so the mutual recursion is kind of necessary.

Now the heap graph peaks at ~20MB as it should:

(Disclaimer: All programs in this answer were compiled with -O0)

Memoized Collatz sequence

1 Answers1

Solution