1

Currently i'm trying to write an interpretter in Ocaml and this is my lexer.mll:

{

    open Parser
    exception Eof
}


rule main = parse
      [ ' ' '\t' ]  { main lexbuf } 
    | [ '\n' ]  { EOL } 
    | ['0'-'9']+ as lxm { LINE_NUMBER(int_of_string lxm) }
    | [^\\]*\.(\w+)$  as lxm { FILE_NAME lxm }
    | "get_line"    { GET_LINE }    
    (*| [ ^-?\b([0-9]{1,3}|1[0-9]{3}|20[0-4][0-9]|205[0-5])\b ]     { RANGE }   (* -2055 < RANGE < 2055 *)*)
    | eof   { raise Eof }

I'm really confused why ocamllex give me an error at the line { FILE_NAME lxm }. If i put #load "str.cma" at the beginning of my lexer, it print out error syntax error on that line.

Why? i'm pretty confused ...

EDIT

should be [ [^\\]*\.(\w+)$ ] as lxm { FILE_NAME lxm }

But problem is still not solved ...

Benoît Guédas
  • 801
  • 7
  • 25
Trung Bún
  • 1,117
  • 5
  • 22
  • 47

1 Answers1

2

There are many parts of your regexp that are not recognized:

  • \\: put it between single quotes to match the "\" character;
  • \.: just put the dot between single quotes to match a dot,
  • \w: ocamllex does not seem to know this escaped sequence, you need to define yours,
  • $: define your line endings.

So first, put this before lexing rules:

let w = ['a'-'z' 'A'-'Z' '0'-'9' '_']
let eol = '\n' | "\r\n"

Then, change your rule to

[^'\\' '\n']*'.'w+eol

The matched expression (lxm) will contain the line ending sequence ('\n' or "\r\n"), so you need to remove it.

And be careful when you try to match a string till the end of the line, because the default behavior is to match the longest string, so it can match more than one line at a time if your regexp accept line endings. That's why I forbid the '\n'.

Benoît Guédas
  • 801
  • 7
  • 25