4

Is there any way to express this in ANTLR4:

Any string as long as it doesn't contain the asterisk immediately followed by a forward slash?

This doesn't work: (~'*/')* as ANTRL throws this error: multi-character literals are not allowed in lexer sets: '*/'

This works but isn't correct: (~[*/])* as it prohibits a string containing the individual character * or /.

james.garriss
  • 12,959
  • 7
  • 83
  • 96
Roger Costello
  • 3,007
  • 1
  • 22
  • 43

3 Answers3

6

I had similar problem, my solution: ( ~'*' | ( '*'+ ~[/*]) )* '*'*.

  • If you've coded more than one lexer for languages with /*....*/ comments, you should already know this trick. – Ira Baxter Jun 25 '16 at 11:04
2

The closest I can come is to put the test in the parser instead of the lexer. That's not exactly what you're asking for, but it does work.

The trick is to use a semantic predicate before any string that must be tested for any Evil Characters. The actual testing is done in Java.

grammar myTest;

@header
{
    import java.util.*;
}

@parser::members
{
    boolean hasEvilCharacters(String input)
    {
        if (input.contains("*/"))
        {
            return false;
        }
        else
        {
            return true;
        }
    }
}

// Mimics a very simple sentence, such as: 
//   I am clean.
//   I have evil char*/acters.
myTest
    : { hasEvilCharacters(_input.LT(1).getText()) }? String 
      (Space { hasEvilCharacters(_input.LT(1).getText()) }? String)* 
      Period EOF
    ;

String
    : ('A'..'Z' | 'a'..'z')+      
    ;

Space
    : ' '
    ;

Period
    : '.'
    ;

Tested with ANTLR 4.4 via the TestRig in ANTLRWorks 2 in NetBeans 8.0.1.

james.garriss
  • 12,959
  • 7
  • 83
  • 96
  • Thanks James! Wow, that is a huge amount of work to solve such a simple problem. – Roger Costello Apr 16 '15 at 13:38
  • Can somebody explain how this works? I don't see where it captures a token that contains a "*". (I – Ira Baxter Apr 17 '15 at 07:33
  • Look in the parse method `hasEvilCharacters()`, which is called by the semantic predicate in the `myTest` parser rule. For more info, read chapter 10 in The Definitive ANTLR 4 Reference. – james.garriss Apr 17 '15 at 11:03
1

If the disallowed sequences are few there exists a solution without parser/lexer actions:

grammar NotParser;

program
    : (starslash | notstarslash)+
    ; 

notstarslash
    : NOT_STAR_SLASH
    ;

starslash
    : STAR_SLASH
    ;

STAR_SLASH
    : '*'+ '/'
    ;

NOT_STAR_SLASH
    : (F_NOT_STAR_SLASH | F_STAR_NOT_SLASH) +
    ;

fragment F_NOT_STAR_SLASH
    : ~('*'|'/')
    ;

fragment F_STAR_NOT_SLASH
    : '*'+ ~('*'|'/')
    | '*'+ EOF
    | '/'
    ;

The idea is to compose the token of

  • all tokens that are neither '*' nor '/'
  • all tokens that begin with '*' but are not followed with '/' or single '/'

There are some rules that deal with special situations (multiple '' followed by '/', or trailing '')

CoronA
  • 7,717
  • 2
  • 26
  • 53
  • ... what happens with ANTLR if the last character in a file is an "*"? What does the ~'/' test do or accept? – Ira Baxter Apr 17 '15 at 06:31
  • Should do now with the special cases (trailing '*', single '/', multiple '*'). Maybe the parse tree does not match the expectations ... but that is hard to tune without knowing the application. – CoronA Apr 17 '15 at 07:07
  • The complexity and non-scalability of your answer and mine indicates a weakness--or is it an opportunity?--in ANTLR 4. – james.garriss Apr 17 '15 at 11:07