1

I am having trouble getting Spirit to try alternatives. I am new to Spirit and probably doing something grossly wrong, so I apologize for dragging down the SNR but any help is appreciated:

I am using a grammar like the following to match "character classes"

'[' >> *(~ascii::char_("-]^") | (ascii::char_ >> '-' >>ascii::char_)) >> ']'

This matches [abc] but not [a-c]. If I remove the first alternative, then [a-c] matches. Shouldn't Spirit try the second alternative when the first fails?

Thanks,

Mike

user2913094
  • 981
  • 9
  • 16

1 Answers1

2

The basic problem is that the first alternative does match -- it just doesn't match what you want it to. As written, your parser matches a sequence of 3 thing, with the middle one being a repeated pattern having an alternative in it.

   '['                   // single char match
>> *(~ascii::char_("-]^") | (ascii::char_ >> '-' >>ascii::char_))  // complex pattern
>> ']'                   // single char match

So lets look at what happens when you try to match [a-c]. First, you match the pattern '[', which succeeds and leaves a-c]. So then it looks at the complex pattern, which it tries to match 0 or more times. With in that * repeat, it first tries ~ascii::char_("-]^"), which matches a, so success leaving -c]. It then repeats, trying to match that pattern again, which fails (- doesn't match), so it tries the second alternative where ascii::char matches -, but '-' does not match c, so it fails. So the end of the * match it has matched the single character a. Finally, it tries to match ']' which fails against the -, so the overall match fails.

Reading the above (if you can make sense of it) should make it clear what you need to do -- you need it to try the range match BEFORE it tries the single char match within the * loop:

'[' >> *((ascii::char_ >> '-' >>ascii::char_) | ~ascii::char_("-]^")) >> ']'

Now it should match both [abc] and [a-c], as well as things like [a-cmx-z].

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • I was afraid that was going to be the answer. I've written many recursive descent parsers by hand, and when I find myself painted into the corner, I backtrack to where I could have taken a different branch of an alternation. It looks like Antlr also is willing to back up on this example and find a match. Is there any way to tell Spirit to keep trying the way Antlr does? – user2913094 Sep 02 '14 at 20:14
  • boost:spirit uses a PEG parser, which is strictly ordered and eager. Antlr uses an LL parser, which is completely different. The 'backtracking' in a PEG parser is only to the points at the `|` alternatives and not to other points. – Chris Dodd Sep 03 '14 at 02:20