ANTLR: Get token name?

Question

I've got a grammar rule,

OR
    : '|';

But when I print the AST using,

public static void Preorder(ITree tree, int depth)
{
    if (tree == null)
    {
        return;
    }

    for (int i = 0; i < depth; i++)
    {
        Console.Write("  ");
    }

    Console.WriteLine(tree);

    for(int i=0; i<tree.ChildCount; ++i)
        Preorder(tree.GetChild(i), depth + 1);
}

(Thanks Bart) it displays the actual | character. Is there a way I can get it to say "OR" instead?

score 10 · Accepted Answer · edited May 23 '17 at 11:53

10

robert inspired this answer.

if (ExpressionParser.tokenNames[tree.Type] == tree.Text)
    Console.WriteLine(tree.Text);
else
    Console.WriteLine("{0} '{1}'", ExpressionParser.tokenNames[tree.Type], tree.Text);

edited May 23 '17 at 11:53

Community

1
1

answered Dec 09 '10 at 23:25

mpen

272,448
266
850
1,236

seems like the `tokenNames` array is deprecated (at least in Antlr 4.5). Instead use the Vocabulary based approach - see answers below. – Matthew Apr 21 '20 at 07:33

score 9 · Answer 2 · answered Dec 09 '10 at 23:01

I had to do this a couple of weeks ago, but with the Python ANTLR. It doesn't help you much, but it might help somebody else searching for an answer.

With Python ANTLR, tokens types are integers. The token text is included in the token object. Here's the solution I used:

import antlrGeneratedLexer

token_names = {}
for name, value in antlrGeneratedLexer.__dict__.iteritems():
    if isinstance(value, int) and name == name.upper():
        token_names[value] = name

There's no apparent logic to the numbering of tokens (at least, with Python ANTLR), and the token names are not stored as strings except in the module __dict__, so this is the only way of getting to them.

I would guess that in C# token types are in an enumeration, and I believe enumerations can be printed as strings. But that's just a guess.

Bingo! `Console.WriteLine(ExpressionParser.tokenNames[tree.Type]);` The `int` is stored in `tree.Type` and the "dict" is stored in `___Parser.tokenNames`. — mpen, Dec 09 '10 at 23:20

the klaus · Answer 3 · 2015-08-21T18:30:46.177

Boy, I spent way too much time banging my head against a wall trying to figure this out. Mark's answer gave me the hint I needed, and it looks like the following will get the token name from a TerminalNode in Antlr 4.5:

myLexer.getVocabulary.getSymbolicName(myTerminalNode.getSymbol.getType)

or, in C#:

myLexer.Vocabulary.GetSymbolicName(myTerminalNode.Symbol.Type)

(Looks like you can actually get the vocabulary from either the parser or the lexer.)

Those vocabulary methods seem to be the preferred way get at the tokens in Antlr 4.5, and tokenNames appears to be deprecated.

It does seem needlessly complicated for what I think is a pretty basic operation, so maybe there's an easier way.

or in JS: this.lexer.symbolicNames[node.type] thank you ! – rednoyz Feb 13 '21 at 14:25 — rednoyz, Feb 13 '21 at 14:25

Sasha · Answer 4 · 2015-06-16T15:16:54.480

I'm new to Antlr, but it seems ITree has no direct obligation to be related to Parser (in .NET). Instead there is a derived interface IParseTree, returned from Parser (in Antlr4), and it contains few additional methods including override:

string ToStringTree(Parser parser);

It converts the whole node subtree into text representation. For some cases it is useful. If you like to see just the name of some concrete node without it's children, then use static method in class Trees:

public static string GetNodeText(ITree t, Parser recog);

This method does basically the same as Mark and Robert suggested, but in more general and flexible way.

score 1 · Answer 5 · answered Oct 24 '21 at 13:43

In addition to robert's pythonic answer (and hopefully will be useful for other languages):

If using the nextToken() method of your generated lexer, you can use the 'type' property of the lexer (not the token, unintuitively enough) to get the numeric code given to the token type by the lexer. In the lexer itself you can see which type got which number. Hope this is helpful.

ANTLR: Get token name?

5 Answers5

Linked