0

I'm trying to build a PCRE engine, and I'm using this ANTLR grammar. These are some of its rules:

octal_char
 : ( Backslash (D0 | D1 | D2 | D3) octal_digit octal_digit
   | Backslash octal_digit octal_digit
   )

 ;

octal_digit
 : D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7
 ;

digit
 : D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 // just '0','1','2','3',...,'9'
 ;

When I try triggering the octal_char rule with strings like \075, it simply doesn't work, and I don't understand why.

Example parse tree for the string \075:

parse
  alternation
    expr
      element
        atom
          shared_atom \0
      element
        atom
          literal
            shared_literal
              digit 7
      element
        atom
          literal
            shared_literal
              digit 5
  <EOF>
136
  • 1,083
  • 1
  • 7
  • 14

1 Answers1

0

The shared_atom rule precedes the literal rule in the atom rule.

Knowing nothing of the intent of the language, I can't tell if that's an error or not, but that's what catching the \0.

Depending on the intended semantics, you may need to reorder those rule references, modify lookahead, and/or use syntactic predicates to fix this.

Swapping the order of the two rule refs will get the octal to match, but may cause other things that shared_atom should match to get caught by literal and possibly fail.

Scott Stanchfield
  • 29,742
  • 9
  • 47
  • 65