2

I recently took part in a competitive coding competition.

This Haskell gave a space leak on the judge system running ghc 7.6.3:

t n [] = 0
t n ('8':rest) = t (n+1) rest
t n (')':rest) = n + (t n rest)

main = getLine >>= (\l -> print (t 0 l))

After the competition the test cases were published. One of the failure cases was this: (a file containing 10^5 close parens): https://cses.fi/download/1/b575d19a75bf724b50fa4a399f8187b6d6edb4ccb62bd1a774f9294969152e46

The error was

Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it.

I also know that the code was compiled with -O2 and -Wall on what I believe to be GHC 7.6.3.

For the life of me I can't reproduce the error with GHC 8.0.2 or 7.10.3 on my machine.

Is there an obvious space leak in the code?

Edit:

I tested the code as suggested below and a fold that was implemented thusly:

import Data.Foldable

t (kasit, score) '8' = (kasit+1, score)
t (kasit, score) _ = (kasit, score+kasit)

main = getLine >>= (\l -> print (snd (foldl' (t) (0, 0) l )))

Neither the bang pattern nor the reimplementation with the strict foldl' solved the problem, though the bangy one got more test cases through. The original still WOMM. Admittedly this is stepping outside of the scope of a generally useful stack exchange question and beginning to look like good old homework. This is also pretty undebugable without more knowledge of the judge system.

Mörkö
  • 211
  • 1
  • 14
  • GHC 7.6.3 is really, really, really old. I believe it was released in April of 2013, which is over five years ago. GHC has since released 7.8, 7.10, 8.0, and 8.2, along with minor releases in between. There is also an alpha release of 8.4 available. There's really no excuse for using 7.6.3 for a competition! That said, I don't think it's actually the compilers fault. – dfeuer Jan 15 '18 at 06:51
  • @dfeuer The answer to that is Ubuntu 14.04. They are stuck on an old lts and will update the images after the finals. – Mörkö Jan 15 '18 at 07:32

2 Answers2

5

Yes, the n parameter exhibits an "obvious" (to some) space leak: because it doesn't need to be inspected (e.g. you don't have a case t 0 ... = ...) you can build up thunks of additions in your recursive calls.

Better would be something like:

t _ [] = 0
t !n ('8':rest) = t (n+1) rest
t !n (')':rest) = n + (t n rest)

best would be probably to define this in terms of foldl'.

It's totally possible that a newer version of GHC than 7.6 would do a better job analyzing strictness and optimizing this code.

Useful factoid, forcing stack overflows can be an effective way to find space leaks (which usual manifest as heap usage): http://neilmitchell.blogspot.com/2015/09/detecting-space-leaks.html

jberryman
  • 16,334
  • 5
  • 42
  • 83
  • Thank you very much. I have contacted the competition's organizers to have them run the fixed version on the judge and see if the older ghc fails in the way you described. – Mörkö Jan 15 '18 at 05:13
  • Well, some test cases got resolved but that given one is still broken. I am calling it though: laziness seems to be the issue here. – Mörkö Jan 15 '18 at 06:22
  • This still isn't tail recursive; see my answer. – dfeuer Jan 15 '18 at 07:19
3

I think your last case is causing you trouble. You wrote

t n [] = 0
t n ('8':rest) = t (n+1) rest
t n (')':rest) = n + (t n rest)

Even if we strictify this as jberryman suggested,

t !n [] = 0
t !n ('8':rest) = t (n+1) rest
t !n (')':rest) = n + (t n rest)

the third case is not tail recursive. How can we fix this? Simply add another accumulator representing the amount to add on at the end.

t n0 xs = t' 0 n0 xs
  where
    t' !acc !_n [] = acc
    t' acc n ('8':rest) = t' acc (n + 1) rest
    t' acc n (')':rest) = t' (acc + n) n rest
dfeuer
  • 48,079
  • 5
  • 63
  • 167
  • This appears to work on the judge system. Now what remains to be explained is why every other variation including the original works on GHC 7.10.2 but not on the judge, which just now was claimed to be 7.6.3 by the organizers. Edit: ah, it seems that 7.10.2 is newer – Mörkö Jan 15 '18 at 07:28
  • @Mörkö, part of the problem may be that old GHC versions kept a separate stack of limited size; more modern versions keep the stack on the heap (IIUC) and let it get as big as it needs to. – dfeuer Jan 15 '18 at 07:40
  • Well that's nice to know. Maybe I'll ask the organizers to increase the stack size as there is already a global memory limit of 512 megs. Can there be adverse effects? – Mörkö Jan 15 '18 at 07:43
  • @Mörkö, unfortunately, I can't personally answer such detailed questions about such an old compiler. Sorry. It's hard enough for me to keep track of performance with the new ones. – dfeuer Jan 15 '18 at 07:50
  • That's understandable, I will try to come up with something. – Mörkö Jan 15 '18 at 07:50
  • My main point was whether increasing the stack size beyond an external memory limit (set by the judge) risks delaying gc until the program goes through said limit. – Mörkö Jan 15 '18 at 07:58
  • @Mörkö, again, I really don't know many details about 7.6.3. Your best bet might be to download a binary distribution of that version and experiment locally with different options. In my view, your original implementation is broken, and the right thing to do is rewrite it. I understand that's not what you want to hear as a contest participant. – dfeuer Jan 15 '18 at 08:06
  • No, I whole heatedly agree with it being broken, but by being broken it showcased what in my opinion is a problem with the way the judge is set up. That is the stack being limited to 8 megs. – Mörkö Jan 15 '18 at 08:11
  • Yeah mine was a sloppy answer, sorry. The stack overflow is surely from the last case which is neither tail-recursive nor productive. And looking at the core in 8.2 it does look like the lack of stack overflow must be due to a growable stack – jberryman Jan 15 '18 at 15:37