1

I want to match a constant, which is basically an all uppercase string.

Also, I want to match an identifier, which can contain a mix of lowercase and uppercase letters.

Start 
  = Constant
  / Identifier

Identifier
  = h:[A-Za-z_] t:[a-zA-Z0-9_]* { return { type: 'IDENTIFIER', value: h + t.join('') } }

Constant
  = h:[A-Z] t:[A-Z_0-9]* { return { type: 'CONSTANT', value: h + t.join('') } }

The problem is, when I try to match Asd, it says: Line 1, column 2: Expected [A-Z_0-9] or end of input but "s" found.

It seems it matches the Constant rule but doesn't swap to the Identifier one even when it fails...

The problem seems to be that a constant is also a valid identifier, but I can't figure out rules to break the ambiguity, I think if Constant match fails it should just try the Identifier rule...

gosukiwi
  • 1,569
  • 1
  • 27
  • 44

1 Answers1

1

The problem here happens because parsing expressions grammars are not like context free grammars. They get the first match instead of backtracking. The Constant rule is defined before Identifier. Asd matches the start character for a constant rule, but the next char doesn't, therefore, it throws an error, because it is deterministic. Hopefully, it is easy to fix:

Start 
  = Constant
  / Identifier

Identifier
  = h:[A-Za-z_] t:[a-zA-Z0-9_]* { return { type: 'IDENTIFIER', value: h + t.join('') } }

Constant
  = h:[A-Z] ![a-z] t:[A-Z_0-9]* { return { type: 'CONSTANT', value: h + t.join('') } }  

Outputs:

{
   "type": "IDENTIFIER",
   "value": "Asd"
}

PEGs are, by default, deterministic and avoid ambiguity, as your rule defines.

Marcelo Camargo
  • 2,240
  • 2
  • 22
  • 51