Looking at the following grammar which has an obvious flaw as far as parser generators are concerned:
"Start Symbol" = <Foo>
"Case Sensitive" = True
"Character Mapping" = 'Unicode'
{A} = {Digit}
{B} = [abcdefABCDEF]
{C} = {A} + {B}
Integer = {A}+
HexNumber = {C}+
<ContextA> ::= '[' HexNumber ']'
<ContextB> ::= '{' Integer '}'
<Number> ::= <ContextA> | <ContextB>
<Foo> ::= <Number> <Foo>
| <>
The reason why this grammar is flawed, is, that the scanner cannot distinguish between the terminals [Integer;HexNumber]
. (Is 1234
an integer or a hex number?!).
In the productions written in this example, this becomes irrelevant to bits, but there might be grammars, where the context of the productions would clarify if an integer or a hex number is expected and the scanner would still refuse to collaborate.
So, the scanner would need to know the parser state in order to be able to make the right decision as for the hex or integer token.
Now the question for the terminology. What does this make this ... errm... grammar? Lexer? then? A context sensitive lexer? Or would one say this is a context sensitive grammar, even though it is clearly a scanner problem? Is there other terminology used to describe such phenomena?