14

I've got a grammar rule,

OR
    : '|';

But when I print the AST using,

public static void Preorder(ITree tree, int depth)
{
    if (tree == null)
    {
        return;
    }

    for (int i = 0; i < depth; i++)
    {
        Console.Write("  ");
    }

    Console.WriteLine(tree);

    for(int i=0; i<tree.ChildCount; ++i)
        Preorder(tree.GetChild(i), depth + 1);
}

(Thanks Bart) it displays the actual | character. Is there a way I can get it to say "OR" instead?

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236

5 Answers5

10

robert inspired this answer.

if (ExpressionParser.tokenNames[tree.Type] == tree.Text)
    Console.WriteLine(tree.Text);
else
    Console.WriteLine("{0} '{1}'", ExpressionParser.tokenNames[tree.Type], tree.Text);
Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • seems like the `tokenNames` array is deprecated (at least in Antlr 4.5). Instead use the Vocabulary based approach - see answers below. – Matthew Apr 21 '20 at 07:33
9

I had to do this a couple of weeks ago, but with the Python ANTLR. It doesn't help you much, but it might help somebody else searching for an answer.

With Python ANTLR, tokens types are integers. The token text is included in the token object. Here's the solution I used:

import antlrGeneratedLexer

token_names = {}
for name, value in antlrGeneratedLexer.__dict__.iteritems():
    if isinstance(value, int) and name == name.upper():
        token_names[value] = name

There's no apparent logic to the numbering of tokens (at least, with Python ANTLR), and the token names are not stored as strings except in the module __dict__, so this is the only way of getting to them.

I would guess that in C# token types are in an enumeration, and I believe enumerations can be printed as strings. But that's just a guess.

robert
  • 33,242
  • 8
  • 53
  • 74
  • Bingo! `Console.WriteLine(ExpressionParser.tokenNames[tree.Type]);` The `int` is stored in `tree.Type` and the "dict" is stored in `___Parser.tokenNames`. – mpen Dec 09 '10 at 23:20
5

Boy, I spent way too much time banging my head against a wall trying to figure this out. Mark's answer gave me the hint I needed, and it looks like the following will get the token name from a TerminalNode in Antlr 4.5:

myLexer.getVocabulary.getSymbolicName(myTerminalNode.getSymbol.getType)

or, in C#:

myLexer.Vocabulary.GetSymbolicName(myTerminalNode.Symbol.Type)

(Looks like you can actually get the vocabulary from either the parser or the lexer.)

Those vocabulary methods seem to be the preferred way get at the tokens in Antlr 4.5, and tokenNames appears to be deprecated.

It does seem needlessly complicated for what I think is a pretty basic operation, so maybe there's an easier way.

the klaus
  • 81
  • 2
  • 5
1

I'm new to Antlr, but it seems ITree has no direct obligation to be related to Parser (in .NET). Instead there is a derived interface IParseTree, returned from Parser (in Antlr4), and it contains few additional methods including override:

string ToStringTree(Parser parser);

It converts the whole node subtree into text representation. For some cases it is useful. If you like to see just the name of some concrete node without it's children, then use static method in class Trees:

public static string GetNodeText(ITree t, Parser recog);

This method does basically the same as Mark and Robert suggested, but in more general and flexible way.

Sasha
  • 8,537
  • 4
  • 49
  • 76
1

In addition to robert's pythonic answer (and hopefully will be useful for other languages):

If using the nextToken() method of your generated lexer, you can use the 'type' property of the lexer (not the token, unintuitively enough) to get the numeric code given to the token type by the lexer. In the lexer itself you can see which type got which number. Hope this is helpful.