0

When I get the token with these rules

STRINGA :   '"' (options {greedy=false;}: ESC | .)* '"';
STRINGB :   '\'' (options {greedy=false;}: ESC | .)* '\'';

it ends up grabbing 'text' instead of just text. I can easily remove the ' and ' myself but was wondering how I can get ANTLR to remove it?

Dean Hiller
  • 19,235
  • 25
  • 129
  • 212

2 Answers2

1

One approach is to define the string contents as a separate category, for example

STRINGA : '"' STRINGCONTENTS '"';
STRINGB : '\'' STRINGCONTENTS '\'';

then capture the STRINGCONTENTS value.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
1

You will need some custom code for that. Also, you shouldn't be using a . (dot) inside the rule: you should explicitly define you want to match everything except a backslash (assuming that is what your ESQ starts with), a quote and line break chars probably.

Something like this would do it:

grammar T;

parse
 : STRING EOF {System.out.println($STRING.text);}
 ;

STRING
 : '"' (ESQ | ~('"' | '\\' | '\r' | '\n'))* '"'
   {
     String matched = getText();
     StringBuilder builder = new StringBuilder();

     for(int i = 1; i < matched.length() - 1; i++) {
       char ch = matched.charAt(i);
       if(ch == '\\') {
         i++;
         ch = matched.charAt(i);
         switch(ch) {
           case 'n': builder.append('\n'); break;
           case 't': builder.append('\t'); break;
           default: builder.append(ch); break;
         }
       }
       else {
         builder.append(ch);
       }
     }

     setText(builder.toString());
   }
 ;

fragment ESQ
 : '\\' ('n' | 't' | '"' | '\\')
 ;

If you now parse the input "tabs:'\t\t\t'\nquote:\"\nbackslash:\\", the following will be printed to the console:

tabs:'         '
quote:"
backslash:\

To keep the grammar clean, you could of course move the code in a custom method:

grammar T;

@lexer::members {
  private String fix(String str) {
    ...
  }
}

parse
 : STRING EOF {System.out.println($STRING.text);}
 ;

STRING
 : '"' (ESQ | ~('"' | '\\' | '\r' | '\n'))* '"' {setText(fix(getText()));}
 ;

fragment ESQ
 : '\\' ('n' | 't' | '"' | '\\')
 ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288