How to get a parameter to the ANTLR lexer object?

Question

I'm writing a JAVA software to parse SQL queries. In order to do so I'm using ANTLR with presto.g4. The code I'm currently using is pretty standard:

PrestoLexer lexer = new PrestoLexer(
              new CaseChangingCharStream(CharStreams.fromString(query), true));

      lexer.removeErrorListeners();
      lexer.addErrorListener(errorListener);

      CommonTokenStream tokens = new CommonTokenStream(lexer);
      PrestoParser parser = new PrestoParser(tokens);

I wonder whether it's possible to pass a parameter to the lexer so the lexing will be different depends on that parameter?

update: I've used @Mike's suggestion below and my lexer now inherits from the built-in lexer and added a predicate function. My issue is now pure grammar.

This is my string definition:


STRING
    : '\'' ( '\\' .
           | '\\\\'  .  {HelperUtils.isNeedSpecialEscaping(this)}?       // match \ followed by any char
           | ~[\\']       // match anything other than \ and '
           | '\'\''       // match ''
           )*
      '\''
    ;

I sometimes have a query with weird escaping for which the predicate returns true. For example:


select 
table1(replace(replace(some_col,'\\'',''),'\"' ,'')) as features 
from table1

And when I try to parse it I'm getting: '\'',''),'

As a single string. how can I handle this one?

Mike Lischke · Accepted Answer · 2020-12-30T14:15:11.947

I don't know what you need the parameter for, but you mentioned SQL, so let me present a solution I used since years: predicates.

In MySQL (which is the dialect I work with) the syntax differs depending on the MySQL version number. So in my grammar I use semantic predicates to switch off and on language parts that belong to a specific version. The approach is simple:

test:
    {serverVersion < 80014}? ADMIN_SYMBOL
    | ONLY_SYMBOL
;

The ADMIN keyword is only acceptable for version < 8.0.14 (just an example, not true in reality), while the ONLY keyword is a possible alternative in any version.

The variable serverVersion is a member of a base class from which I derive my parser. That can be specified by:

options {
    superClass = MySQLBaseRecognizer;
    tokenVocab = MySQLLexer;
}

The lexer also is derived from that class, so the version number is available in both lexer and parser (in addition to other important settings like the SQL mode). With this approach you can also implement more complex functions for predicates, that need additional processing.

You can find the full code + grammars at the MySQL Workbench Github repository.

thanks @Mike! this was in fact very useful. I have other issue with the parsing now. I'll elaborate in the original post. — Nir99, Dec 30 '20 at 11:31
@Nir99 better open a new question for the grammar part on SO, which will allow others to chime in for solutions. — Mike Lischke, Dec 30 '20 at 14:17
already opened one- https://stackoverflow.com/questions/65506538/handling-different-escaping-sequences — Nir99, Dec 30 '20 at 14:22

score 0 · Answer 2 · answered Dec 29 '20 at 12:17

0

I wonder whether it's possible to pass a parameter to the lexer so the lexing will be different depends on that parameter?

No, the lexer works independently from the parser. You cannot direct the lexer while parsing.

answered Dec 29 '20 at 12:17

Bart Kiers

166,582
36
299
288

I see. Thank you for that answer! – Nir99 Dec 29 '20 at 12:51

How to get a parameter to the ANTLR lexer object?

2 Answers2