Interestingly enough I tried your lexer on the code of my lexer/evaluator written in JS ;) You're right, it is not always doing well with regular expressions. Here some examples:
rexl.re = {
NAME: /^(?!\d)(?:\w)+|^"(?:[^"]|"")+"/,
UNQUOTED_LITERAL: /^@(?:(?!\d)(?:\w|\:)+|^"(?:[^"]|"")+")\[[^\]]+\]/,
QUOTED_LITERAL: /^'(?:[^']|'')*'/,
NUMERIC_LITERAL: /^[0-9]+(?:\.[0-9]*(?:[eE][-+][0-9]+)?)?/,
SYMBOL: /^(?:==|=|<>|<=|<|>=|>|!~~|!~|~~|~|!==|!=|!~=|!~|!|&|\||\.|\:|,|\(|\)|\[|\]|\{|\}|\?|\:|;|@|\^|\/\+|\/|\*|\+|-)/
};
This one is mostly fine - only UNQUITED_LITERAL
is not recognized, otherwise all is fine. But now let's make a minor addition to it:
rexl.re = {
NAME: /^(?!\d)(?:\w)+|^"(?:[^"]|"")+"/,
UNQUOTED_LITERAL: /^@(?:(?!\d)(?:\w|\:)+|^"(?:[^"]|"")+")\[[^\]]+\]/,
QUOTED_LITERAL: /^'(?:[^']|'')*'/,
NUMERIC_LITERAL: /^[0-9]+(?:\.[0-9]*(?:[eE][-+][0-9]+)?)?/,
SYMBOL: /^(?:==|=|<>|<=|<|>=|>|!~~|!~|~~|~|!==|!=|!~=|!~|!|&|\||\.|\:|,|\(|\)|\[|\]|\{|\}|\?|\:|;|@|\^|\/\+|\/|\*|\+|-)/
};
str = '"';
Now all after the NAME's
regexp messes up. It makes 1 big string. I think the latter problem is that String token is too greedy. The former one might be too smart regexp for the regex
token.
Edit: I think I've fixed the regexp for the regex
token. In your code replace lines 146-153 (the whole 'following characters' part) with the following expression:
([^/]|(?<!\\)(?<=\\)/)*
The idea is to allow everything except /
, also allow \/
, but not allow \\/
.
Edit: Another interesting case, passes after the fix, but might be interesting to add as the built-in test case:
case 'UNQUOTED_LITERAL':
case 'QUOTED_LITERAL': {
this._js = "e.str(\"" + this.value.replace(/\\/g, "\\\\").replace(/"/g, "\\\"") + "\")";
break;
}
Edit: Yet another case. It appears to be too greedy about keywords as well. See the case:
var clazz = function() {
if (clazz.__) return delete(clazz.__);
this.constructor = clazz;
if(constructor)
constructor.apply(this, arguments);
};
It lexes it as: (keyword, const), (id, ructor)
. The same happens for an identifier inherits
: in
and herits
.