0

I have a language I am making a parser for which contains function calls. A few function names are reserved and I would like to handle them differently in my grammer. In EBNF it would look like

FunctionCall ::= FunctionName '(' ')'
SpecialFunctionCall :: SpecialName '(' ')'

FunctionName ::= VariableName - SpecialFunctionName

SpecialFunctionName ::= "special_function_a" | "special_function_b"

My problem is in translating the exception operator from EBNF to flex.

FunctionName    {Letter}{LetterOrDigit}

Is a super set of SpecialFunctionName, which is a hard-coded string

SpecialFunctionName   "special_function_a" | "special_function_b"

Hence I get a warning from bison saying that SpecialFunction will never be matched. Should I merge the tokens and compare the strings in the parser, or is there a recommended way to resolve this ambiguity in in flex?

Lesmana
  • 25,663
  • 9
  • 82
  • 87
Akusete
  • 10,704
  • 7
  • 57
  • 73

2 Answers2

3

The normal way of dealing with this to have the lexical analyzer recognize the special names and return the correct token type (SpecialName) for the special names and a regular identifier token (apparently FunctionName) for the other tokens.

However, it normally requires an undue degree of prescience on the part of the lexical analyzer to say that a particular (non-reserved, non-special) word is a function name rather than a simple identifier (which could also be a simple variable - unless you've gone down the Perl route of using sigils to identify variables from functions).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • "it normally requires an undue degree of prescience", could you elaborate? – Akusete Nov 14 '10 at 23:23
  • 1
    @Akusete: in many grammars, there is no lexical difference between an identifier used for a variable and an identifier used for a function (Perl is an exception with its sigils). So, for the lexical analyzer to determine that a particular name is a variable or function, it must have access to some non-lexical information (symbol table information). If all variables and functions must be declared/defined before use, then the necessary information may be available - and you've avoided the need for prescience. Languages like C are traditionally somewhat sloppy about this. [...continued...] – Jonathan Leffler Nov 14 '10 at 23:31
  • @Akusete: the alternative is that the lexical analyzer looks ahead some number of tokens and determines from context that the name it is looking at must be a function name rather than an identifier - but you normally struggle to avoid imbuing that much knowledge of the grammar into the lexical analyzer. – Jonathan Leffler Nov 14 '10 at 23:32
  • Thanks, I've managed to solve the ambiguity by having the lexer use the context of the previous token (In this language an operator must be preceded by a mutually exclusive set to variable names). – Akusete Nov 14 '10 at 23:39
0

As long as you put the SpecialFunction rule FIRST in the lexer file:

{SpecialFunctionName}    { return SpecialName; }
{FunctionName}           { return FunctionName; }

any identifer that matches both patterns will trigger the first rule and thus return SpecialName instead of FunctionName.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226