Specifically, I am trying to implement a RegExp parser in ANTLR.
Here are the relevant parts of my grammar:
grammar JavaScriptRegExp;
options {
language = 'CSharp3';
}
tokens {
/* snip */
QUESTION = '?';
STAR = '*';
PLUS = '+';
L_CURLY = '{';
R_CURLY = '}';
COMMA = ',';
}
/* snip */
quantifier returns [Quantifier value]
: q=quantifierPrefix QUESTION?
{
var quant = $q.value;
quant.Eager = $QUESTION == null;
return quant;
}
;
quantifierPrefix returns [Quantifier value]
: STAR { return new Quantifier { Min = 0 }; }
| PLUS { return new Quantifier { Min = 1 }; }
| QUESTION { return new Quantifier { Min = 0, Max = 1 }; }
| L_CURLY min=DEC_DIGITS (COMMA max=DEC_DIGITS?)? R_CURLY
{
var minValue = int.Parse($min.Text);
if ($COMMA == null)
{
return new Quantifier { Min = minValue, Max = minValue };
}
else if ($max == null)
{
return new Quantifier { Min = minValue, Max = null };
}
else
{
var maxValue = int.Parse($max.Text);
return new Quantifier { Min = minValue, Max = maxValue };
}
}
;
DEC_DIGITS
: ('0'..'9')+
;
/* snip */
CHAR
: ~('^' | '$' | '\\' | '.' | '*' | '+' | '?' | '(' | ')' | '[' | ']' | '{' | '}' | '|')
;
Now, INSIDE of the curly braces, I would like to tokenize ',' as COMMA, but OUTSIDE, I would like to tokenize it as CHAR.
Is this possible?
This is not the only case where this is happening. I will have many other instances where this is a problem (decimal digits, hyphens in character classes, etc.)
EDIT:
I know realize that this is called context-sensitive lexing. Is this possible with ANTLR?