how to tell whether Parsec parser uses constant heap space in Haskell

Question

In a recent question, I asked about the following parsec parser:

manyLength
  :: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLength p = go 0
  where
    go :: Int -> ParsecT s u m Int
    go !i = (p *> go (i + 1)) <|> pure i

This function is similar to many. However, instead of returning [a], it returns the number of times it was able to successfully run p.

This works well, except for one problem. It doesn't run in constant heap space.

In the linked question, Li-yao Xia gives an alternative way of writing manyLength that uses constant heap space:

manyLengthConstantHeap
  :: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeap p = go 0
  where
    go :: Int -> ParsecT s u m Int
    go !i =
      ((p *> pure True) <|> pure False) >>=
        \success -> if success then go (i+1) else pure i

This is a significant improvement, but I don't understand why manyLengthConstantHeap uses constant heap space, while my original manyLength doesn't.

If you inline (<|>) in manyLength, it looks somewhat like this:

manyLengthInline
  :: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthInline p = go 0
  where
    go :: Int -> ParsecT s u m Int
    go !i =
      ParsecT $ \s cok cerr eok eerr ->
        let meerr :: ParserError -> m b
            meerr err =
              let neok :: Int -> State s u -> ParserError -> m b
                  neok y s' err' = eok y s' (mergeError err err')
                  neerr :: ParserError -> m b
                  neerr err' = eerr $ mergeError err err'
              in unParser (pure i) s cok cerr neok neerr
        in unParser (p *> go (i + 1)) s cok cerr eok meerr

If you inline (>>=) in manyLengthConstantHeap, it looks somewhat like this:

manyLengthConstantHeapInline
  :: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeapInline p = go 0
  where
    go :: Int -> ParsecT s u m Int
    go !i =
      ParsecT $ \s cok cerr eok eerr ->
        let mcok :: Bool -> State s u -> ParserError -> m b
            mcok success s' err =
                let peok :: Int -> State s u -> ParserError -> m b
                    peok int s'' err' = cok int s'' (mergeError err err')
                    peerr :: ParserError -> m b
                    peerr err' = cerr (mergeError err err')
                in unParser
                    (if success then go (i + 1) else pure i)
                    s'
                    cok
                    cerr
                    peok
                    peerr
            meok :: Bool -> State s u -> ParserError -> m b
            meok success s' err =
                let peok :: Int -> State s u -> ParserError -> m b
                    peok int s'' err' = eok int s'' (mergeError err err')
                    peerr :: ParserError -> m b
                    peerr err' = eerr (mergeError err err')
                in unParser
                    (if success then go (i + 1) else pure i)
                    s'
                    cok
                    pcerr
                    peok
                    peerr
        in unParser ((p *> pure True) <|> pure False) s mcok cerr meok eerr

Here is the ParsecT constructor for completeness:

newtype ParsecT s u m a = ParsecT
  { unParser
      :: forall b .
         State s u
      -> (a -> State s u -> ParseError -> m b) -- consumed ok
      -> (ParseError -> m b)                   -- consumed err
      -> (a -> State s u -> ParseError -> m b) -- empty ok
      -> (ParseError -> m b)                   -- empty err
      -> m b
  }

Why does manyLengthConstantHeap run with constant heap space, while manyLength does not? It doesn't look like the recursive call to go is in the tail-call position for either manyLengthConstantHeap or manyLength.

When writing parsec parsers in the future, how can I know the space requirements for a given parser? How did Li-yao Xia know that manyLengthConstantHeap would be okay?

I don't feel like I have any confidence in predicting which parsers will use a lot of memory on a large input.

Is there an easy way to figure out whether a given function will be tail-recursive in Haskell without running it? Or better yet, without compiling it?

Tail recursion isn't necessary or sufficient for haskell code to run in constant space. — Carl, Mar 30 '17 at 13:29
@Carl That's interesting, thanks. Would you be able to add an answer explaining it? Or maybe point me to a paper or blog post explaining it? — illabout, Mar 30 '17 at 13:30

how to tell whether Parsec parser uses constant heap space in Haskell

0 Answers0