I'm writing a lexer and parser in ocamllex and ocamlyacc as follows. function_name
and table_name
are same regular expression, i.e., a string containing only english alphabets. The only way to determine if a string is function_name
or table_name
is to check its surroundings. For example, if such a string is surrounded by [
and ]
, then we know that it is a table_name
. Here is the current code:
In lexer.mll
,
... ...
let function_name = ['a'-'z' 'A'-'Z']+
let table_name = ['a'-'z' 'A'-'Z']+
rule token = parse
| function_name as s { FUNCTIONNAME s }
| table_name as s { TABLENAME s }
... ...
In parser.mly
:
... ...
main:
| LBRACKET TABLENAME RBRACKET { Table $2 }
... ...
As I wrote | function_name as s { FUNCTIONNAME s }
before | table_name as s { TABLENAME s }
, the above code failed to parse [haha]
; it firstly considered haha
as a function_name
in the lexer, then it could not find any corresponding rule for it in the parser. If it could consider haha
as a table_name
in the lexer, it would match [haha]
as a table in the parser.
One workaround for this is to be more precise in the lexer. For example, we define let table_name_with_brackets = '[' ['a'-'z' 'A'-'Z']+ ']'
and | table_name_with_brackets as s { TABLENAMEWITHBRACKETS s }
in the lexer. But, I would like to know if there is any other options. Is it not possible to make lexer and parser work together to determine the tokens and the reduction?