I'm trying to make a simple parser with attoparsec. The production rules are along the lines of:
block: ?token> [inline]
inline: <?token>foo<?> | anyText
So, what I'm trying to get at is, a block starts with the literal ?, followed by a token, followed by a >, followed by a sequence of inlines.
And an inline is either a sequence of the form foo, or just any plain text.
I am having explosive memory use, but I'm not sure how I can factor the parser to avoid it. The point of the parser I'm writing is to pull out those 'token' things. Here is my implementation:
import Control.Applicative
import Control.Monad
import Data.Attoparsec.Text as Text
import Data.Text
blockLine :: Parser [Text]
blockLine = do
block <- hiddenBlock -- the block token
inlines <- many (hiddenInline <|> inline) -- followed by inlines, which might have tokens
return $ block : inlines
inline = manyTill anyChar (hiddenInline <|> (endOfInput >> return Text.empty))
hiddenInline = Text.pack <$> do
char '<' -- opening "tag"
char '?' -- opening "tag" still
token <- manyTill anyChar (char '>') -- the token
manyTill anyChar (string "<?>") -- close the "tag"
return token
hiddenBlock = Text.pack <$> do
char '?'
manyTill anyChar (char '>')
This looks, to me, to be a very straightforward translation of the production rules into an LL parser. I suppose the difficulty is that I'm not sure how to express the production for an inline. It's supposed to be "arbitrary" text, but the parse should stop as soon as it finds a hiddenInline.