Parser skips lines

Question

I want to write a simple parser for a subset of Jade, generating some XmlHtml for further processing.

The parser is quite simple, but as often with Parsec, a bit long. Since I don't know if I am allowed to make such long code posts, I have the full working example here.

I've dabbled with Parsec before, but rarely successfully. Right now, I don't quite understand why it seems to swallow following lines. For example, the jade input of

.foo.bar
    | Foo
    | Bar
    | Baz

tested with parseTest tag txt, returns this:

Element {elementTag = "div", elementAttrs = [("class","foo bar")], elementChildren = [TextNode "Foo"]}

My parser seems to be able to deal with any kind of nesting, but never more than one line. What did I miss?

score 6 · Accepted Answer · answered May 05 '12 at 23:26

6

If Parsec cannot match the remaining input, it will stop parsing at that point and simply ignore that input. Here, the problem is that after having parsed a tag, you don't consume the whitespace in the beginning of the line before the next tag, so Parsec cannot parse the remaining input and bails. (There might also be other issues, I can't test the code right now)

There are many ways of adding something that consumes the spaces, but I am not familiar with Jade so I cannot tell you which way is the "correct" way (I don't know how the indentation syntax works) but just adding whiteSpace somewhere at the end of tag should do it.

By the way, you should consider splitting up your parser into a Lexer and Parser. The Lexer produces a token stream like [Ident "bind", OpenParen, Ident "tag", Equals, StringLiteral "longname", ..., Indentation 1, ...] and the parser parses that token stream (Yes, Parsec can parse lists of anything). I think that it would make your job easier/less confusing.

answered May 05 '12 at 23:26

dflemstr

25,947
5
70
105

Using the Token module is usually idiomatic for lexing with Parsec. However as Jade appears to use white-space significantly, I think you are right to suggest writing a separate scanner, see section 2.11 of the slightly out-of-date Parsec manual: http://research.microsoft.com/en-us/um/people/daan/download/parsec/parsec.pdf – stephen tetley May 06 '12 at 06:11
Unfortunately, I have added whitespace all around the `tag` and `tagP` parsers, and the result remains... – Lanbo May 06 '12 at 08:11
@Scán - that's a problem though. As Jade appears to be indentation sensitive so you can't use Parsec's `whiteSpace` or `lexeme` parsers directly as they consume all white-space. Indentation sensitive parsing is quite advanced (there are some libraries to help on Hackage - I don't know how useful or documented they are). If you are new-ish to Parsec / parsing I'd recommend choosing a language a that's bit simpler than Jade to get some practice on. – stephen tetley May 06 '12 at 11:39
@stephentetley arguably since `whiteSpace` consumes white space up to a certain point, you'd still be able to get the parser column at that point and make it work. But I agree that the current parser isn't suited for this task. – dflemstr May 06 '12 at 11:41

Parser skips lines

1 Answers1