Questions tagged [antlr4]

Version 4 of ANother Tool for Language Recognition (ANTLR), a flexible lexer/parser generator. ANTLR4 features an enhanced adaptive LL(*) parsing algorithm, that improves on the simpler LL(*) algorithm used in ANTLR3.

ANTLR stands for ANother Tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. At its core, ANTLR uses a grammar, with syntax loosely based on Backus–Naur_Form, to generate a parser. That parser produces easily traversable parse trees, which can be processed further by the user. ANTLR's simplistic and powerful design has allowed it to be used in many projects, from the expression evaluator in Apple's Numbers application1, to IntelliJ's IDEA IDE2.

The main improvement between ANTLR4 and ANTLR3 is a change in the parsing algorithm. This new variation of the LL(*) parsing algorithm, coined adaptive LL(*), pushes all of the grammar analysis effort to runtime, making ANTLR able to handle left recursive rules. This new resilience lead to the name "Honey Badger", on which Terence Parr had this to say:

ANTLR v4 is called the honey badger release after the fearless hero of the YouTube sensation, "The Crazy Nastyass Honey Badger". To quote the honey badger, ANTLR v4 just doesn't give a damn. It's pretty bad ass. It'll take just about any grammar you give it at parse correctly. And, without backtracking!*

-- Terence Parr

(To read more, check out the full conversation!)

If you are interested in learning to use ANTLR4, a good place to start would be the official documentation, which provides an excellent introduction to the library itself.

Further Reading:

1 Sourced from a paper written by Terrence Parr himself.

2 Sourced from Jetbrain's official list of third party software in IDEA.

3 On January 24th 2013, the www.antlr.org address was changed from pointing at site for ANTLR version 3 (www.antlr3.org) to ANTLR version 4 (www.antlr4.org). So questions and answers that used www.antlr.org were correct for ANTLR 3.x before this date. The links should be updated to www.antlr3.org for ANTLR 3.x or www.antlr4.org for ANTLR 4.x.

3877 questions
10
votes
1 answer

Rule reference is not currently supported in a set in ANTLR4 Grammar

I am trying to port Chris Lambro's ANTLR3 Javascript Grammar to ANTLR4 I am getting the following error, Rule reference 'LT' is not currently supported in a set in the following code ~(LT)* LineComment : '//' ~(LT)* -> skip ; LT : '\n' …
Gautam
  • 7,868
  • 12
  • 64
  • 105
10
votes
2 answers

Is it possible to parse big file with ANTLR?

Is it possible to instruct ANTLR not to load entire file into memory? Can it apply rules one by one and generate topmost list of nodes sequentially, along with reading file? Also may be it is possible to drop analyzed nodes somehow?
Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385
10
votes
1 answer

Group terminals into set

What does this warning mean ? How do I solve it ? Here is the code I am referring to expression : expression operator=DIV expression | expression operator=MUL expression | expression operator=ADD expression |…
Gautam
  • 7,868
  • 12
  • 64
  • 105
9
votes
1 answer

ANTLRInputStream and ANTLRFileStream are deprecated, what are the alternatives?

If I use ANTLRFileStream antlrFileStream = new ANTLRFileStream("myfile.testlang"); or ANTLRInputStream input = new ANTLRInputStream( new FileInputStream("myfile.testlang") ); Compiler shows deprecation error for both the classes what is…
Ameer Tamboli
  • 1,218
  • 12
  • 20
9
votes
0 answers

class constructor cannot be invoked without 'new

I am trying to use a 3rd party typescript library(antlr - https://github.com/tunnelvisionlabs/antlr4ts) in my angular 2 project created using angular-cli. It's failing with this error class constructor MyLexer cannot be invoked without 'new. If you…
user911
  • 1,509
  • 6
  • 26
  • 52
9
votes
1 answer

Token Aliases in Antlr

I have rules that look something like this: INTEGER : [0-9]+; field3 : INTEGER COMMA INTEGER; In the parsed tree I get an List called INTEGER with two elements. I would rather find a way for each of the elements to be named. But if I…
Be Kind To New Users
  • 9,672
  • 13
  • 78
  • 125
9
votes
1 answer

PL/SQL ANTLR grammar fails for some PL/SQL files?

I'am using ANTLR4 to generate Abstract Syntax Tree (AST) for PL/SQL codes. For some queries it works fine, but for some codes it generates an AST with only one node which is not right. For example: DECLARE a RAW; -- migrate to BLOB …
Chathura Wijeweera
  • 289
  • 1
  • 2
  • 9
9
votes
4 answers

Slow ANTLR4 generated Parser in Python, but fast in Java

I am trying to convert ant ANTLR3 grammar to an ANTLR4 grammar, in order to use it with the antlr4-python2-runtime. This grammar is a C/C++ fuzzy parser. After converting it (basically removing tree operators and semantic/syntactic predicates), I…
Vektor88
  • 4,841
  • 11
  • 59
  • 111
9
votes
2 answers

Compiling sample ANTRL4 output

From the Definitive ANTLR4 reference I have run through the first example and it has generated the JAVA target. In the directory C:\JavaLib I have antlr-4.5-complete.jar When I attempt to compile it with; javac -classpath C:\JavaLib *.java It…
CarbonMan
  • 4,350
  • 12
  • 54
  • 75
9
votes
4 answers

ANTLR4: Using non-ASCII characters in token rules

On page 74 of the ANTRL4 book it says that any Unicode character can be used in a grammar simply by specifying its codepoint in this manner: '\uxxxx' where xxxx is the hexadecimal value for the Unicode codepoint. So I used that technique in a token…
Roger Costello
  • 3,007
  • 1
  • 22
  • 43
9
votes
3 answers

cannot create implicit token for string literal in non-combined grammar

so found a nice grammar for a calculator and copied it with some lil changes from here: https://dexvis.wordpress.com/2012/11/22/a-tale-of-two-grammars/ I have two Files: Parser and Lexer. Looks like this: parser grammar Parser; options{ …
FelRPI
  • 429
  • 6
  • 15
9
votes
2 answers

Handling String Literals which End in an Escaped Quote in ANTLR4

How do I write a lexer rule to match a String literal which does not end in an escaped quote? Here's my grammar: lexer grammar StringLexer; // from The Definitive ANTLR 4 Reference STRING: '"' (ESC|.)*? '"'; fragment ESC : '\\"' | '\\\\' ; Here's…
hendryau
  • 426
  • 3
  • 14
9
votes
1 answer

ANTLRv4: non-greedy rules

I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76): STRING: '"' (ESC|.)*? '"'; fragment ESC: '\\"' | '\\\\' ; The rule matches a typical C++ string - a char sequence included in "", which can…
Andy
  • 634
  • 7
  • 19
9
votes
2 answers

Matching arbitrary text (both symbols and spaces) with ANTLR?

How to match any text in ANTLRv4? I mean text, which is unknown at the time of grammar writing? My grammar is follows: grammar Anytext; line : comment; comment : '#' anytext; anytext: ANY*; WS : [ \t\r\n]+; ANY : .; And my code is…
Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385
9
votes
1 answer

ANTLR 4 lexer tokens inside other tokens

I have the following grammar for ANTLR 4: grammar Pattern; //parser rules parse : string LBRACK CHAR DASH CHAR RBRACK ; string : (CHAR | DASH)+ ; //lexer rules DASH : '-' ; LBRACK : '[' ; RBRACK : ']' ; CHAR : [A-Za-z0-9] ; And I'm…
Charles
  • 365
  • 3
  • 13