0

I'm trying to write lexer/parser for R6RS, and I'm stuck with datum-skipping comment

Here is some part of my lexer/parser rules:

BOOLEAN: '#t' | '#f' | '#T' | '#F';
NUMBER: DIGIT+; // TODO: incomplete
CHAR: '#\\' CHARNAME | '#\\x' HEXDIGIT+ | '#\\' . ;
STRING: '"' STRELEMENT* '"';
IDENTIFIER: INITIAL SUBSEQUENT* | PERCULIAR_ID;

COMMENT: (';' .*? LINE_ENDING | '#!r6rs' ) -> skip;
NESTED_COMMENT: '#|' (NESTED_COMMENT | ~[#|] | ('|' ~'#') | ('#' ~'|') )* '|#' -> skip;

datum: lexemeDatum
     | compoundDatum;
compoundDatum: list
             | vector
             | bytevector;

// (rest omitted...)

Now, I want to write like skipDatum: '#;' datum -> skip. Unfortunely, parser rule doesn't allow ->skip. Neither SKIPDATUM: '#;' datum -> skip would work because lexer rule can't refernce parser rule.

In my opinion, while "commenting out" is the responsibility of lexer and "constructing datum" is the responsibility of the parser, rule regarding #; needs both.

Here is my current solution:

skipDatum: '#;' datum;

list: '(' (datum|skipDatum)* ')' #ProperListDatum
    | '[' (datum|skipDatum)* ']' #ProperListDatum
    | '(' skipDatum* datum (datum|skipDatum)* '.' skipDatum* datum skipDatum* ')' #ImproperListDatum
    | '[' skipDatum* datum (datum|skipDatum)* '.' skipDatum* datum skipDatum* ']' #ImproperListDatum

While it is working, it seems so ugly; where I really want to write rules using datum, I always have to write like skipDatum* datum skipDatum*

Is there any better solution? Thanks in advance.

Venusaur
  • 191
  • 12

1 Answers1

0

You could use something like this.

datum
    :   SKIP_DATUM? ...
    ;

SKIP_DATUM : '#;';

This would require you to perform the following check every time you use a DatumContext in the generated code, while simplifying the grammar.

if (ctx.SKIP_DATUM() != null) {
    // handle skipped datum here (return?)
}
Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • Although it seems to work, it looks like we move 'ugliness' to Visitor class. While I continue find better ways, I'll try this method. Thanks for the answer. – Venusaur Apr 23 '13 at 15:37
  • I decided to remain "complex work" at parser rules, but with the help of predicates like: `list: '[' (d1+=datum | skipDatum)+ '.' (d2+=datum | skipDatum)+ ']' { $d1.size()>0 && $d2.size()==1 }? #ImproperListDatum`. I didn't notice I can insert some "code" in my rule file. Although I didn't follow your suggestion, thanks for the answer. – Venusaur Apr 26 '13 at 01:46