Antlr tokens from file

Question

What is the best way to feed Antlr with huge numbers of tokens? Say we have a list of 100,000 English verbs, how could we add them to our grammar? We could of cause include a huge grammar file like verbs.g, but maybe there is a more elegant way, by modifying a .token file etc?

grammar verbs;

VERBS:
'eat' |
'drink' |
'sit' |
...
...
| 'sleep'
;

Also should the tokens rather be lexer or parser tokens, ie VERBS: or verbs: ? Probably VERBS:.

Update: A file english_verbs.g fails to be consumed by antlr despite of the fact that no special chars occur: at org.antlr.tool.GrammarSanity.traceStatesLookingForLeftRecursion(GrammarSanity.java:149) ... (repeated 10^99 times) — Team Pannous, Feb 09 '12 at 01:48
Whatever you do would probably test the limits of the recognizer. — Sergey Kalinichenko, Feb 09 '12 at 01:53
No, there's no way you can create a lexer with than many rules. For a work around, see: http://stackoverflow.com/questions/9008134/dynamically-create-lexer-rule — Bart Kiers, Feb 10 '12 at 08:05

score 2 · Answer 1 · answered Mar 06 '15 at 21:42

I rather would use semantic predicates.

For this you have to define a token

word : [a-z]+

and at every site you want to use a verb (instead of a generic word) put a semantic predicate that checks if the parsed word is in the list of verbs.

Using recommend not to use the parser/lexer for such a task

each additional verb would change the grammar
each additional verb enlarges the generated code
conjugation is easier
upper/lower case could be handled easier

Antlr tokens from file

1 Answers1