Questions tagged [antlr4]

Version 4 of ANother Tool for Language Recognition (ANTLR), a flexible lexer/parser generator. ANTLR4 features an enhanced adaptive LL(*) parsing algorithm, that improves on the simpler LL(*) algorithm used in ANTLR3.

ANTLR stands for ANother Tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. At its core, ANTLR uses a grammar, with syntax loosely based on Backus–Naur_Form, to generate a parser. That parser produces easily traversable parse trees, which can be processed further by the user. ANTLR's simplistic and powerful design has allowed it to be used in many projects, from the expression evaluator in Apple's Numbers application1, to IntelliJ's IDEA IDE2.

The main improvement between ANTLR4 and ANTLR3 is a change in the parsing algorithm. This new variation of the LL(*) parsing algorithm, coined adaptive LL(*), pushes all of the grammar analysis effort to runtime, making ANTLR able to handle left recursive rules. This new resilience lead to the name "Honey Badger", on which Terence Parr had this to say:

ANTLR v4 is called the honey badger release after the fearless hero of the YouTube sensation, "The Crazy Nastyass Honey Badger". To quote the honey badger, ANTLR v4 just doesn't give a damn. It's pretty bad ass. It'll take just about any grammar you give it at parse correctly. And, without backtracking!*

-- Terence Parr

(To read more, check out the full conversation!)

If you are interested in learning to use ANTLR4, a good place to start would be the official documentation, which provides an excellent introduction to the library itself.

Further Reading:

1 Sourced from a paper written by Terrence Parr himself.

2 Sourced from Jetbrain's official list of third party software in IDEA.

3 On January 24th 2013, the www.antlr.org address was changed from pointing at site for ANTLR version 3 (www.antlr3.org) to ANTLR version 4 (www.antlr4.org). So questions and answers that used www.antlr.org were correct for ANTLR 3.x before this date. The links should be updated to www.antlr3.org for ANTLR 3.x or www.antlr4.org for ANTLR 4.x.

3877 questions
5
votes
1 answer

ANTLR4 using HIDDEN channel causes errors while using skip does not

In my grammar I use: WS: [ \t\r\n]+ -> skip; when I change this to use HIDDEN channel: WS: [ \t\r\n]+ -> channel(HIDDEN); I receive errors (extraneous input ' '...) I did not receive while using 'skip'. I thought, that skipping and sending to a…
r2mzes
  • 53
  • 6
5
votes
1 answer

Antlr4: How can I both hide and use Tokens in a grammar

I'm parsing a script language that defines two types of statements; control statements and non control statements. Non control statements are always ended with ';', while control statements may end with ';' or EOL ('\n'). A part of the grammar looks…
paseg
  • 406
  • 4
  • 12
5
votes
1 answer

How to correctly parse a VB Case statement?

I'm trying to parse VBA code, and the 5.4.2.10 section of the spec defines the Select Case statement, which we've defined as follows: // 5.4.2.10 Select Case Statement selectCaseStmt : SELECT whiteSpace? CASE whiteSpace? selectExpression…
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
5
votes
1 answer

Modify expressions, generated by Antlr?

I would like to read expressions with Antlr4 and the perform some modifications on them. For example, if grammar is arithmetic, I would modify expression, representing 2 * (3 + 1) with 2 * 4 and then with 8 This is "calculation" or…
Dims
  • 47,675
  • 117
  • 331
  • 600
5
votes
2 answers

How do I tell an array from a procedure call?

Context I'm parsing vba code, where... This code outputs the contents of the first dimension of array a at index i: Debug.Print a(i, 1) This code outputs the result of function a given parameters i and 1: Debug.Print a(i, 1) This code calls…
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
5
votes
1 answer

antlr4 python target cannot recognize unicode

I have a ID terminator ID : ([A-Z_]|'\u0100'..'\uFFFE') ([A-Z_0-9]|'\u0100'..'\uFFFE')*; and a .txt sample file to parse 均60:=MA(C,60); I generated Java and Python2 target and test each against sample file respectively. Java target can parse…
gzc
  • 8,180
  • 8
  • 42
  • 62
5
votes
1 answer

Extracing specific tags from arbitrary plain text

I want to parse plain text comments and look for certain tags within them. The types of tags I'm looking for look like: Where "name" is a [a-z] string (from a fixed list) and "1234" represents a [0-9]+ number. These tags can occur…
Nick B.
  • 79
  • 6
5
votes
2 answers

How to implement the lexer hack for a C parser in ANTLR

Is it possible to implement the classic Yacc lexer hack to differentiate between identifier names and type names in a C parser generated by ANTLR4, using a standard C grammar (like the one found on the official ANTLR4 GitHub repo) ? It seems ad-hoc…
Nick
  • 761
  • 1
  • 7
  • 20
5
votes
3 answers

Running ANTLR grun (TestRig) on grammar in a package.

I have all the generated java files in a single directory after ANTLR execution, so I used some options to generate a separate directory and namespace to be stored and compiled to store all the generated files. This is the grammar file: grammar…
prosseek
  • 182,215
  • 215
  • 566
  • 871
5
votes
1 answer

antlr literal string matching: what am I doing wrong?

I've been using antlr for 3 days. I can parse expressions, write Listeners, interpret parse trees... it's a dream come true. But then I tried to match a literal string 'foo%' and I'm failing. I can find plenty of examples that claim to do this. I…
Eric Newton
  • 51
  • 1
  • 3
5
votes
1 answer

Testing grammar for ambiguities

I'm writing a grammar for a formal language. Ideally I'd want that grammar to be unambiguous, but that might not be possible. In either case, I want to know about all possible ambiguities while developing the grammar. How can I do that? So far, most…
MvG
  • 57,380
  • 22
  • 148
  • 276
5
votes
2 answers

How do I make the auto-generated parser class implement an interface in ANTLR4?

I am using ANTLR 4 to create a parser, and I have completed my grammar. I need to inject some Java code into the resulting parser file that ANTLR auto-generates for me. If I want to include a method in the resulting parser, I can add this to the…
james.garriss
  • 12,959
  • 7
  • 83
  • 96
5
votes
4 answers

ANTLR cannot generate Javascript code as of version 4.5

when i try to generate the listener/visitor ... for my Grammar I get the following error : ANTLR cannot generate Javascript code as of version 4.5 Does anybody knows how to fix it? I still can generate C# and Java Code.
Bruno
  • 894
  • 11
  • 32
5
votes
1 answer

how should i limit length of an ID token in ANTLR?

This should be fairly simple. I'm working on a lexer grammar using ANTLR, and want to limit the maximum length of variable identifiers to 32 characters. I attempted to accomplish this with this line(following normal regex - syntax): ID :…
Mahdi Javaheri
  • 1,080
  • 13
  • 25
5
votes
1 answer

ANTLR V4 + Java8 Grammar -> OutOfMemoryException

I'm trying to use ANTLR V4 with the publicly given Java 8 grammar - https://github.com/antlr/grammars-v4/blob/master/java8/Java8.g4 I generated the class files and tried to parse the Java 8 JRE, but somehow at java.text.SimpleDateFormat.java it…
Ronald Duck
  • 323
  • 1
  • 11