1

I want to match 'Foo' as Foo, not 'Foo'.

I have the following lexer rule:

STRING_LITERAL
 : '\'' ( ~'\'' | '\'\'' )* '\''
 ;

But it seems to match the quotes.

My visitor looks like this:

public override IFilterExpression VisitLiteral_value(MagicHubFilterParser.Literal_valueContext context) {
    return MakeExpression(context.GetText());
}

I know I can trim it at that point, but I suspect it would be quicker and cleaner to handle it on the lexer level, if possible.

What's the best way to do this?

Kir
  • 2,905
  • 2
  • 27
  • 44
  • I think that it is quite common to handle it in the visitor. You will not only have to trim but also to unescape it. However you also could do these operations in the lexer by calling `setText` on the token just behind recognition. – CoronA Mar 31 '15 at 20:28
  • I eventually discovered that if I create it as a parser rule, then I have access to the center of the string as a child node. But this seems wrong. Would there be any performance advantage in doing this from the lexer as opposed to just trimming in the visitor? – Kir Apr 01 '15 at 14:39
  • Yet the parser rule solution is very uncommon (and probably not performant). If you need a visitor for other tasks the performance should be comparable. If the visitor does nothing else it would be best to do the trimming at other places (but I would select the parser, not the lexer). Doing trimming in the lexer produces problems if you want to report exact line numbers in error messages. – CoronA Apr 01 '15 at 15:11

1 Answers1

0

As @CoronA suggests, it might be more canonical to do in the visitor. However, I did figure out how to do it with a parsing rule:

stringBody : ( ~'\'' | '\'\'' )*;
stringLiteral
  : '\'' body=stringBody '\''
  ;
Kir
  • 2,905
  • 2
  • 27
  • 44