4

I try to parse xml-like tags (but not a correct xml document..): the goal is to get back just "Flange width" without the beginning or trailing whitespaces, but with the internal ones.

open FParsec

let testParser =
    pstring "<desc>" .>>. spaces
    >>. manyCharsTill anyChar (spaces .>>. pstring "</desc>")

run testParser "<desc> Flange width </desc>"

The expected result if I understood the parser combinators:

the anyChar parser to keep swallowing chars unit the "till" parser which looks for spaces followed by the end tag succeeds.

What actually happens, is the "till" parser fails on the space before "width" (as it should) but short circuits the manyTill parser instead of letting the anyChar swallow that space and continue onwards.

Output:

val it : ParserResult<string,unit> =
  Failure:
Error in Ln: 1 Col: 15
<desc> Flange width </desc>
              ^
Expecting: '</desc>'

What am I not getting? or what would be an idiomatic solution here?

Balinth
  • 548
  • 4
  • 10
  • As a side-note, I've also found that just doing `|>> fun (s : string) -> s.Trim()` is sometimes easier and even faster than trying to skip over whitespace. – Tyrrrz Dec 27 '19 at 10:45

1 Answers1

5

The issue is that spaces parses successfully and moves the stream to the beginning of w. pstring "</desc>" then fails.

The end result is that the endp parser has failed, but it has changed the state (we've moved past the spaces). You want the parser to fail and not change the state (before the space). The docs for manyTill (which are referred to by manyCharsTill) explain this:

The parser manyTill p endp repeatedly applies the parser p for as long as endp fails (without changing the parser state).

You can do that using the .>>.? operator:

The parser p1 .>>.? p2 behaves like p1 .>>. p2, except that it will backtrack to the beginning if p2 fails with a non‐fatal error and without changing the parser state, even if p1 has changed the parser state.

So this instead:

let testParser =
    pstring "<desc>" .>>. spaces
    >>. manyCharsTill anyChar (spaces .>>.? pstring "</desc>")

See this fiddle for a working demo.

Charles Mager
  • 25,735
  • 2
  • 35
  • 45