Is it possible to use the auto-generated antlr parser (or its grammar) from a Xtext project?

Question

I was wondering, whether it is possible to take the antlr grammar (*.g) or the generated parsers (from this grammar) and use it in a separate project?

For this I was looking into the SysMLv2 (eclipse-based) project on github, where xtext was used in order to define the grammar of this new modelling language. The grammar and the generated parsers can be found here.

My first idea was just to take the grammar file (InternalAlf.g) and use antlr (i tried 3.5.0 and 3.5.2) in order to generate the parser + lexer. Doing this i end up with a bunch of error message that symbols were not found (the symbol in question: EObject).

Then since it is obviously an eclipse project i figured another naive solution would be to package the whole project as a jar and include it as library in mine. I tried to use eclipse for that (export -> excecutable jar). That option requires a MainClass, where i am not sure which one to take and which also lets me doubt this approach. Using the other export jar option, does not allow to add the necessary dependencies to my jar.

Anyone other proposals? Since the antlr grammar file is available, it should be (actually) quite easy to generate the parser, but i am not sure how to do this, since this grammar file has a bunch of dependecies. Or if I rephrase this question: how do i deal with this type of antlr grammar files (that have dependecies to java libraries). In typical antlr tutorials, I (as a newb in antlr and xtext) could not find the answer.

best regards

Of course, it's possible. I would suggest moving it to Antlr4, though. There are a few syntactic (`=>`) and gated semantic predicates (`?=>`) in the grammar, which are unnecessary with Antlr4. There are rule attribute definitions, also which are likely unnecessary. And, you would have to strip all the unnecessary `@header` code, too. I have tools that do the conversion automatically, but they crap out on this grammar--for the moment. — kaby76, Aug 25 '21 at 20:23
I converted your grammar to Antlr4 and generated a parser driver for it, but it doesn't work out of the box on any SysML examples. The parser fails immediately with anything because *whitespace is not ignored*, which it comes directly from that Antlr3 grammar--last line. Since whitespace is not explicitly mentioned elsewhere in the Antlr3 grammar, XText must do piecemeal parsing of the input. What a strange system. I would start from the XText input grammar and scrape your DSL grammar from that. — kaby76, Aug 25 '21 at 23:14
am not sure if i can follow. in antlr 3 you configure hidden tokens on CommonTokenStream (see XtextTokenStream subclass in Xtext) — Christian Dietrich, Aug 26 '21 at 07:36
ps: can you give some hints "how" you want to use the parser? if you just want to work with the ast you can perfectly use Xtext in "standalone mode" https://stackoverflow.com/questions/44716914/text-file-parsing-java-bean-instantiation-with-mwe2-xtext/44787099#44787099 will give some starting hints — Christian Dietrich, Aug 26 '21 at 07:40
Kaby's point is that these space tokens (the rule `RULE_WS` in the InternalAlf.g grammar) are not discarded/handled in the grammar itself. This is probably handled by some XText logic. In other words: the InternalAlf.g cannot be used as-is because XText does some magic behind the scenes. — Bart Kiers, Aug 26 '21 at 07:40
Antlr3 provided `skip()` method and `$channel` for whitespace. Examples are [here](https://github.com/antlr/grammars-v4/blob/476c1bdcbc9b16ae19e5387cccc05e7b24293ebc/antlr/antlr3/examples/C.g#L538) and [here](https://github.com/antlr/grammars-v4/blob/476c1bdcbc9b16ae19e5387cccc05e7b24293ebc/antlr/antlr3/examples/Java.g#L1403). Antlr4 added syntax to support this directly. That's how most people implement WS with Antlr, in the grammar. Other parser generators do the same thing (Flex, Javacc, Rex, Pegjs, Lark, etc.), but with different syntax. It should be placed in the grammar. WS is syntax. — kaby76, Aug 26 '21 at 10:56
Note, even after adding `-> skip` to the usual (comments, whitespace), the grammar still does not work. https://gist.github.com/kaby76/9b57c46e68851f39f20b091d15980a87 . The Antlr4 grammar is automatically generated from the Antlr3 grammar using several tools I wrote for refactoring and converting parser generator grammars. I just don't know how XText uses the Antlr3 grammar, but my guess is that it wants to support incremental parsing and semantic analysis in the IDE. — kaby76, Aug 26 '21 at 11:15
OK, after poking around a bit, I find out that the grammar isn't for SysML per se, but ".kerml" files, e.g., [here](https://github.com/Systems-Modeling/SysML-v2-Pilot-Implementation/tree/master/kerml/src/examples). I tested it out on a few. The Antlr4 grammar works perfectly fine, with mods! — kaby76, Aug 26 '21 at 13:01

score 2 · Answer 1 · edited Aug 26 '21 at 07:30

I looked at the grammar in that project. IT is HIGHLY specific to Xtext. (To the point that it’s a bit difficult to find the ANTLR grammar amongst all of the actions).

You might be able to use the ANTLR3 grammar to parse it and discard all of the actions, etc. that make it so tightly coupled to Xtext (being careful about any semantic predicates and dependencies they might have on those actions). Emphasis on the MIGHT here.

In short, it’s not going to be at all simple to generate a parser divorced from Xtext using this grammar.

If you were to elaborate on what you need to accomplish by not just using the Xtext SysMLv2, and feel a need to create a separate parser someone might be able to point you in an appropriate direction.

this is why xtext offers to generate the debug grammar. so that you are free of the ast creating actions — Christian Dietrich, Aug 26 '21 at 07:36

Is it possible to use the auto-generated antlr parser (or its grammar) from a Xtext project?

1 Answers1