Questions tagged [antlr4]

Version 4 of ANother Tool for Language Recognition (ANTLR), a flexible lexer/parser generator. ANTLR4 features an enhanced adaptive LL(*) parsing algorithm, that improves on the simpler LL(*) algorithm used in ANTLR3.

ANTLR stands for ANother Tool for Language Recognition, a powerful parser generator for reading, processing, executing, or translating structured text or binary files. At its core, ANTLR uses a grammar, with syntax loosely based on Backus–Naur_Form, to generate a parser. That parser produces easily traversable parse trees, which can be processed further by the user. ANTLR's simplistic and powerful design has allowed it to be used in many projects, from the expression evaluator in Apple's Numbers application1, to IntelliJ's IDEA IDE2.

The main improvement between ANTLR4 and ANTLR3 is a change in the parsing algorithm. This new variation of the LL(*) parsing algorithm, coined adaptive LL(*), pushes all of the grammar analysis effort to runtime, making ANTLR able to handle left recursive rules. This new resilience lead to the name "Honey Badger", on which Terence Parr had this to say:

ANTLR v4 is called the honey badger release after the fearless hero of the YouTube sensation, "The Crazy Nastyass Honey Badger". To quote the honey badger, ANTLR v4 just doesn't give a damn. It's pretty bad ass. It'll take just about any grammar you give it at parse correctly. And, without backtracking!*

-- Terence Parr

(To read more, check out the full conversation!)

If you are interested in learning to use ANTLR4, a good place to start would be the official documentation, which provides an excellent introduction to the library itself.

Further Reading:

1 Sourced from a paper written by Terrence Parr himself.

2 Sourced from Jetbrain's official list of third party software in IDEA.

3 On January 24th 2013, the www.antlr.org address was changed from pointing at site for ANTLR version 3 (www.antlr3.org) to ANTLR version 4 (www.antlr4.org). So questions and answers that used www.antlr.org were correct for ANTLR 3.x before this date. The links should be updated to www.antlr3.org for ANTLR 3.x or www.antlr4.org for ANTLR 4.x.

3877 questions
6
votes
1 answer

What are these odd errors that occur when I attempt to generate C# with ANTLR4?

I'm (now) trying to use ANTLR4 and C# to design a language, and so far I've been fiddling around with it. In the process, I decided to try and create a simple mathematical expression evaluator. In the process, I created the following ANTLR grammar…
Ethan Bierlein
  • 3,353
  • 4
  • 28
  • 42
6
votes
0 answers

How to recognise start-of-line in an Antlr grammar?

In the language I work with, some keywords must be at the start of the line. This is mainly because string values within the language can go over multiple lines, and strings could easily contain these keywords. The old yacc/lex grammar…
wolandscat
  • 213
  • 2
  • 6
6
votes
1 answer

How to report errors from ANTLR 4 Visitor?

I created a grammar for boolean expressions and now I'm trying to implement visitor for evaluating it. It is told that there is no need to overcomplicate grammar lexer and parser rules with semantic analysis because it is much better to provide…
Sasha
  • 8,537
  • 4
  • 49
  • 76
6
votes
2 answers

AnTLR4 strange behavior in precedence

I have a very simple test grammar as following: grammar Test; statement: expression EOF; expression : Identifier | expression binary_op expression | expression assignment_operator expression | expression '.'…
pinker
  • 1,283
  • 2
  • 15
  • 32
6
votes
1 answer

How to detect beginning of line, or: "The name 'getCharPositionInLine' does not exist in the current context"

I'm trying to create a Beginning-Of-Line token: lexer grammar ScriptLexer; BOL : {getCharPositionInLine() == 0;}; // Beginning Of Line token But the above emits the error The name 'getCharPositionInLine' does not exist in the current context As…
Tar
  • 8,529
  • 9
  • 56
  • 127
6
votes
1 answer

ANTLR4 Lexer getTokens() returning 0 tokens

I'm running code from here: https://github.com/bkiers/antlr4-csv-demo. I want to view the tokens analyzed by the lexer by adding this line: System.out.println("Number of tokens: " + tokens.getTokens().size()) to Main.java: public static void…
Corey Wu
  • 1,209
  • 1
  • 22
  • 39
6
votes
2 answers

Ignoring whitespace (in certain parts) in Antlr4

I am not so familiar with antlr. I am using version 4 and I have a grammar where whitespace is not important in some parts (but it might be in others, or rather its luck). So say we have the following grammar grammar Foo; program : A* ; A : ID '@'…
George Kastrinis
  • 4,924
  • 4
  • 29
  • 46
6
votes
3 answers

How to use ANTLR v4 for syntax highlighting?

I've built a grammar for a DSL and I'd like to display some elements (table names) in some colors. I output HTML from Java. columnIdentifier : columnName=Identifier | tableName=Identifier '.' columnName=Identifier ; Identifier :…
Adrien
  • 1,075
  • 1
  • 12
  • 18
6
votes
2 answers

ANTLR parses greedily even though it can match high priority rule

I am using the following ANTLR grammar to define a function. definition_function : DEFINE FUNCTION function_name '[' language_name ']' RETURN attribute_type '{' function_body '}' ; function_name : id ; language_name : id …
Ayash
  • 61
  • 3
6
votes
1 answer

extra channels in antlr 4.5

I am using antlr 4.5 to build a parser for a language with several special comment formats, which I would like to stream to different channels. It seems antlr 4.5 has been extended with a new construct for declaring extra lexer channels: extract…
remi
  • 566
  • 3
  • 13
6
votes
1 answer

ANTLR4: ignore white spaces in the input but not those in string literals

I have a simple grammar as follows: grammar SampleConfig; line: ID (WS)* '=' (WS)* string; ID: [a-zA-Z]+; string: '"' (ESC|.)*? '"' ; ESC : '\\"' | '\\\\' ; // 2-char sequences \" and \\ WS: [ \t]+ -> skip; The spaces in the input are completely…
Vikdor
  • 23,934
  • 10
  • 61
  • 84
6
votes
5 answers

how to handling nested comments in antlr lexer

How to handle nested comments in antlr4 lexer? ie I need to count the number of "/*" inside this token and close only after the same number of "*/" have been received. As an example, the D language has such nested comments as "/+ ... +/" For…
R71
  • 4,283
  • 7
  • 32
  • 60
6
votes
1 answer

How do I know where a task came from in gradle?

I have a complicated gradle build system that I inherited. It works pretty well, but includes multiple plugins (java, groovy, antlr, jacoco, jetty, etc.). I could not figure out how to accomplish something, so I did a './gradlew tasks --all'. It…
C Dorman
  • 551
  • 5
  • 12
6
votes
1 answer

How Get error messages of antlr parsing?

I wrote a grammar with antlr 4.4 like this : grammar CSV; file : row+ EOF ; row : value (Comma value)* (LineBreak | EOF) ; value : SimpleValueA | QuotedValue ; Comma : ',' ; LineBreak : '\r'? '\n' | '\r' …
Hamed F
  • 800
  • 3
  • 11
  • 23
6
votes
2 answers

What should the correct grammar be for correct precedence evaluation of +,-,/,*, etc

My grammar has these rules expression : expression EQ conditionalOrExpression #eqExpr | expression NEQ conditionalOrExpression #neqExpr | expression LT conditionalOrExpression #ltExpr |…
XBond
  • 236
  • 2
  • 10