Range quantifier syntax in ANTLR Regex

Question

This should be fairly simple. I'm working on a lexer grammar using ANTLR, and want to limit the maximum length of variable identifiers to 30 characters. I attempted to accomplish this with this line(following normal regex - except for the '' thing - syntax):

ID  :   ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'){0,29}  {System.out.println("IDENTIFIER FOUND.");}
    ;

No errors in code generation, but compilation failed due to a line in the generated code that was simply:

0,29

Obviously antlr is taking the section of text between the brackets and placing it in the accept state area along with the print line. I searched the ANTLR site, and I found no example or reference to an equivalent expression. What should the syntax of this expression be?

score 10 · Accepted Answer · edited Aug 30 '12 at 06:51

10

ANTLR does not support the {m,n} quantifier syntax. ANTLR sees the {} of your quantifier and can't tell them apart from the {} that surround your actions.

Workarounds:

Enforce the limit semantically. Let it gather an unlimited size ID and then complain/truncate it as part of your action code or later in the compiler.
Create the quantification rules manually.

This is an example of a manual rule that limits IDs to 8.

SUBID : ('a'..'z'|'A'..'Z'|'0'..'9'|'_')
      ;
ID : ('a'..'z'|'A'..'Z')
     (SUBID (SUBID (SUBID (SUBID (SUBID (SUBID SUBID?)?)?)?)?)?)?
   ;

Personally, I'd go with the semantic solution (#1). There is very little reason these days to limit the identifiers in a language, and even less reason to cause a syntax error (early abort of the compile) when such a rule is violated.

edited Aug 30 '12 at 06:51

Bart Kiers

166,582
36
299
288

answered Aug 30 '12 at 03:08

walrii

3,472
2
28
47

5

If ANTLR were used only to generate compilers for programming languages then there may be little use for quantifiers. But a grammar (schema) for validating any sort of structured data certainly needs them - credit card numbers are 16 digits, not 13 or 25. The ANTLR4 book has JSON and XML grammars, but without basic token constraints it would be difficult to use an ANTLR grammar as an abstract (codec-independent) version of JSON Schema and XSD. – Dave May 08 '16 at 18:38
2

"There is very little reason these days to limit the identifiers in a language" - that doesn't mean that no languages exist with limited identifiers, nor that people might want to write parsers for them. – Stephen Drew May 20 '17 at 16:03
1

PostgreSQL limits table names to 63 characters, for example. – TheRealChx101 Jan 01 '21 at 05:40

Range quantifier syntax in ANTLR Regex

1 Answers1

Linked