4

I'm trying to write a grammar for various time formats (12:30, 0945, 1:30-2:45, ...) using ANTLR. So far it works like a charm as long as I don't type in characters that haven't been defined in the grammar file.

I'm using the following JUnit test for example:

    final CharStream stream = new ANTLRStringStream("12:40-1300,15:123-18:59");
    final TimeGrammarLexer lexer = new TimeGrammarLexer(stream);
    final CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    final TimeGrammarParser parser = new TimeGrammarParser(tokenStream);

    try {
        final timeGrammar_return tree = parser.timeGrammar();
        fail();
    } catch (final Exception e) {
        assertNotNull(e);
    }

An Exception gets thrown (as expected) because "15:123" isn't valid. If I try ("15:23a") though, no exception gets thrown and ANTLR treats it like a valid input.

Now if I define characters in my grammar, ANTLR seems to notice them and I once again get the exception I want:

  CHAR: ('a'..'z')|('A'..'Z');

But how do I exclude umlauts, symbols and other stuff a user is able to type in (äöü{%&<>!). So basically I'm looking for some kind of syntax that says: match everything BUT "0..9,:-"

black666
  • 2,997
  • 7
  • 25
  • 40

2 Answers2

5

...
So basically I'm looking for some kind of syntax that says: match everything BUT "0..9,:-"

The following rule matches any single character except a digit, ,, : and -:

Foo
  :  ~('0'..'9' | ',' | ':' | '-')
  ;

(the ~ negates single characters inside lexer-rules)

But you might want to post your entire grammar: I get the impression there are some other things you're not doing as they should have been done. Your call.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
2

you can define a literal, that matches all the characters, that you do not want. If this literal is not contained in any of your rules, antlr will throw a NonViableException.

For unicode this could look like this:

 UTF8 :  ('\u0000'..'\u002A'     // ! to * 
     | '\u002E'..'\u002F'           // . / 
     | '\u003B'..'\u00FF'           // ; < = > ? @ as well as letters brackets and stuff
     ) 
     ;
nebenmir
  • 881
  • 2
  • 8
  • 16