0

I am writing a lexer and a parser for Excel formulas.

In Excel, we could assign a cell a name. For example, abc is a valid name, whereas, it is forbidden to name a cell B2 to avoid the confusion with the cell B2. So once we meet a formula =B2, we are sure that B2 refers a cell rather than a user defined name.

In my lexer_formula.mll, I have defined identifiers:

let lex_cell = ['A' - 'Z']+ ['0' - '9']+ (* regular expressions to include all the cells *)
let lex_name = ['A' - 'Z' '0' - '9']+ (* regular expressions to include all the names *)

But a string like B2 with match both lex_cell and lex_name, does anyone know how I could tell the lexer to consider first lex_cell, then lex_name? Will it be sufficient to put lex_cell before lex_name in rule token = parse?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
SoftTimur
  • 5,630
  • 38
  • 140
  • 292

1 Answers1

0

According to the ocamllex manual, it's sufficient to put lex_cell first:

If several regular expressions match a prefix of the input, the “longest match” rule applies: the regular expression that matches the longest prefix of the input is selected. In case of tie, the regular expression that occurs earlier in the rule is selected.

rici
  • 234,347
  • 28
  • 237
  • 341