Is this memoization working properly?

Question

I'm have been working on solving Project Euler #14 for a while now in Haskell, but for some reason, I'm unable to get it working. I solved the problem using Groovy a while ago, and I think I'm using basically the same method here. However, the program runs incredibly slow even just finding the first 10,000 lengths, and I'm really lost now as to why. I think I'm using memoization right, but I'm running out of memory even with smallish data sets in GHCI.

Here's what I've come up with so far.

collatz = (map collatz' [0..] !!)
    where collatz' n
        | n == 1 = 1
        | n `mod` 2 == 0 = 1 + collatz (n `div` 2)
        | otherwise = 1 +  collatz (3 * n + 1)

I'd be running map collatz [1..1000000] to get the answer to the problem, but map collatz [1..10000] gives me an out of memory error, and also takes a good few seconds to finish running.

If anyone could give me some insights as to what the problem with this program is, that would be great! I've tried a lot of things and I'm just stuck and need a hand.

Thanks!

I just did. I didn't get an out of memory error with the set `1..10000`, but it still took the same amount of time. I did get an out of memory error with the data set `1..100000`, and it was also really slow. — Benjamin Kovach, Aug 07 '12 at 16:07
Using a list for memoization is not a good option for this problem. There is a lot of indexing involved and each takes O(n) time. — is7s, Aug 07 '12 at 16:14
You could try one of existing memoization libraries, such as [MemoTrie](http://hackage.haskell.org/package/MemoTrie). And many ideas can by found at [Memoization](http://www.haskell.org/haskellwiki/Memoization) page at the Haskell wiki. — Petr, Aug 07 '12 at 16:25
Also [data-memocombinators](http://hackage.haskell.org/package/data-memocombinators/) are very simple to use, your `collatz` becomes `collatz = integral collatz' where {- ... -}`. I get results for `map collatz [1..10000]` practically instantly. — Vitus, Aug 07 '12 at 16:31
@Vitus Yes, that library uses a much more efficient trie-based memoization for integral functions than a list, so it eliminates the memory problem and lookup-induced speed problem. In this concrete case, it's still slower for me than not using memoization, though. — kosmikus, Aug 07 '12 at 16:52
I believe that `collatz k = (head . map collatz') [k,(k-1)..1]` will greatly increase laziness and hence improve both time & space performance — recursion.ninja, May 01 '14 at 17:31

score 6 · Accepted Answer · answered Aug 07 '12 at 16:30

6

Memoization is working just fine here. In fact, it's working so well that it fills up all your memory. The intermediate terms of the Collatz sequence are getting quite large. The largest term that occurs in any sequence starting with 1 up to 1000000 is the number 2974984576. So this is the length of the list you are trying to build in memory.

On the other hand, just directly implementing the Collatz function without memoization should work fine for this problem.

answered Aug 07 '12 at 16:30

kosmikus

19,549
3
51
66

Also memoizing the numbers up to some limit might be viable approach. – Vitus Aug 07 '12 at 16:34
That makes a lot of sense actually. Thank you! I guess in Groovy I was using a Map for memoization instead of a list. The implementation I'm using is still slow without the use of memoization though, so I'll have to figure another way to speed it up still, haha. – Benjamin Kovach Aug 07 '12 at 16:35
I did some tests with `ghc -O2` and it turns out that version without memoization is indeed the fastest. Memoizing first 200000 - 6.09 sec; no memoization at all - 5.80 sec; also using a worker-wrapper with a strict accumulator results in a runtime of 5.41 sec (note that I used `rem` instead of `mod` in all tests). Tested with `maximum . map collatz $ [1..1000000]`. – Vitus Aug 07 '12 at 16:56
@Vitus You're using `Integer` I suppose? If you use `Int` or `Word` as far as the type takes you (with a 64-bit GHC as far as you need, with 32 bits, you'll need to step into `Integer` a couple of times), you can speed it up by a factor of something like 10. A good memoisation then gives another factor of about 6. – Daniel Fischer Aug 07 '12 at 22:23
@DanielFischer: Yes, since there's no 64bit GHC for Windows (at least as far as I know), I just slapped `Integer` on it and didn't bother with anything more sophisticated. Looks like I should've taken that into account, thanks! – Vitus Aug 08 '12 at 00:57
@Vitus Yes, not yet. I've heard 7.6 will have a 64-bit version on Windows too, but I don't know how reliable that is. Anyway, `Integer` is the correct type, unless you want too see how far you can push it, then use `Int` or `Word` wherever possible. – Daniel Fischer Aug 09 '12 at 13:12

Is this memoization working properly?

1 Answers1