1

I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :

\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }

As you can see, the \title{ } can contain several kind of items :

  • string in utf8 without quotes and with whitespace which I'd like to keep in one token

  • a variable call as : \variable_name

  • some \keyword following by parentheses or other with braces : for instance \draw( utf8 \var \if{ } ... ) or \if{ idem }.

These items can be nested.

I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the \variable_name ( I get a : extraneous input ' ').

Here my lexer gramar code :

 lexer grammar OEFLexer;
    // Default mode rules (the SEA)
    SEA_WS      :   (' '|'\t'|'\r'? '\n')+ ;
    TITLE : '\\title';
    OB    : '{';
    OP    : '(';
    BSLASH  : '\\'                  -> mode(CALLREFERENCE) ;      
    TEXT  : ~[\\({]+;                         // clump all text together 
    // ----------------- Everything Callreference ---------------------
    mode CALLREFERENCE;

    CLOSECALLVAR : ' '          -> mode(DEFAULT_MODE) ; // back to SEA mode 
    CB           : '}'          -> mode(DEFAULT_MODE) ; // back to SEA mode 
    CP           : ')'          -> mode(DEFAULT_MODE) ; // back to SEA mode 

    DRAW    :   'draw' OP;
    IF      :   'if' OB;
    ID      :   [a-zA-Z]+ ;       // match/send ID in tag to parser

Here my parser grammar

parser grammar OEFParser;
options { tokenVocab=OEFLexer; }

document: TITLE OB ( callreference | string )* CB;

string  : TEXT;
var     : ID;
commandDraw : DRAW ( callreference | string )* CP ;
commandIf   : IF ( callreference | string )* CB ;

callreference : BSLASH ID | BSLASH commandDraw CP | BSLASH commandIf CP;

When I tried to parse the \title code mentionned at the beginning I obtain :

line 1:25 extraneous input ' ' expecting {'\', TEXT, '}'}
line 1:37 extraneous input ' ' expecting {'\', TEXT, ')'}
line 1:45 mismatched input 'expression' expecting {'\', TEXT, '}'}
line 1:75 extraneous input '<EOF>' expecting {'\', TEXT, ')'}

With this generated tree generated by Grun

enter image description here

Thanks for your help to help me tackle this issue. Chris

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
chrisb06
  • 23
  • 4

1 Answers1

1

The problem is the space after expression:

\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }
                                                        ^
                                                        ^
                                                        ^

which causes the mode to go back to the DEFAULT_MODE:

CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;

Something that you don't want because you're (obviously) still in the CALLREFERENCE context.

One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a \... ( and \... { you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or }.

A quick lexer grammar demo:

lexer grammar OEFLexer;

TITLE   : '\\title' S? OB -> pushMode(CALLREFERENCE);

fragment OB : '{';
fragment OP : '(';
fragment S : [ \t\r\n]+;

mode CALLREFERENCE;

  CB       : '}'          -> popMode;
  CP       : ')'          -> popMode;

  DRAW     : '\\draw' S? OP -> pushMode(CALLREFERENCE);
  IF       : '\\if' S? OB   -> pushMode(CALLREFERENCE);

  BSLASH   : '\\';
  ID       : [a-zA-Z]+;
  CR_OTHER : .;

and the parser grammar:

parser grammar OEFParser;

options { tokenVocab=OEFLexer; }

document
 : TITLE ( callreference | string )* CB EOF
 ;

string
 : CR_OTHER+
 | ID
 ;

commandDraw
 : DRAW ( callreference | string )* CP
 ;

commandIf
 : IF ( callreference | string )* CB
 ;

callreference
 : BSLASH ID
 | commandDraw
 | commandIf
 ;

Parsing you example input will result in the following parse tree:

enter image description here

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288