5

What is the best way to feed Antlr with huge numbers of tokens? Say we have a list of 100,000 English verbs, how could we add them to our grammar? We could of cause include a huge grammar file like verbs.g, but maybe there is a more elegant way, by modifying a .token file etc?

grammar verbs;

VERBS:
'eat' |
'drink' |
'sit' |
...
...
| 'sleep'
;

Also should the tokens rather be lexer or parser tokens, ie VERBS: or verbs: ? Probably VERBS:.

Team Pannous
  • 1,074
  • 1
  • 8
  • 11
  • Update: A file english_verbs.g fails to be consumed by antlr despite of the fact that no special chars occur: at org.antlr.tool.GrammarSanity.traceStatesLookingForLeftRecursion(GrammarSanity.java:149) ... (repeated 10^99 times) – Team Pannous Feb 09 '12 at 01:48
  • Whatever you do would probably test the limits of the recognizer. – Sergey Kalinichenko Feb 09 '12 at 01:53
  • 1
    No, there's no way you can create a lexer with than many rules. For a work around, see: http://stackoverflow.com/questions/9008134/dynamically-create-lexer-rule – Bart Kiers Feb 10 '12 at 08:05
  • anyway, I recommend you lexer usage – petrbel Jun 25 '14 at 18:20

1 Answers1

2

I rather would use semantic predicates.

For this you have to define a token

word : [a-z]+

and at every site you want to use a verb (instead of a generic word) put a semantic predicate that checks if the parsed word is in the list of verbs.

Using recommend not to use the parser/lexer for such a task

  • each additional verb would change the grammar
  • each additional verb enlarges the generated code
  • conjugation is easier
  • upper/lower case could be handled easier
CoronA
  • 7,717
  • 2
  • 26
  • 53