2

I need to parse files written in some languages(Java, C, C#...) and then trace the AST(Abstract syntax tree) to xml. (Actually the aim is to manipulate it and trace to another language - this second part have been implemented). After investigation I find out that there is no common approach to do this.

The most closest one is srcML. But first problem is that it is not Java =). The second problem is amount of languages (only 3).

I know that DMS can solve this problem, but it is not free and open-source.

So, as I understand, there is single way to do this: take ANTLR and try to convert AST to XML. So question is how to do it with ANTLR(Java), or maybe I miss some(not ANTLR way) to do this.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
Michael
  • 23
  • 1
  • 6
  • Bart Kiers observes correctly that ANTLR has a large number of grammars (and JavaCC [and implicitly other free parser generator toosl], and that many of them are not quite ready for prime time. At the risk of offending the open source community, sometimes you get exactly what you pay for. DMS is not open source (the apparant objection here) but our grammars *are* production ready and we view it as our business to keep them that way. As practical matter, IMHO, having the XML really doesn't help very much; you want additional machinery to do the job you intend, thus the real point of DMS. – Ira Baxter Nov 02 '11 at 17:06
  • All that said, if you really want just the ASTs as XML, and insist on open source, I'd heartily recommend ANTLR as the best choice. Adding a hack to walk the AST and dump XML should be pretty easy to do. – Ira Baxter Nov 02 '11 at 17:10
  • ... you are taking *just* ASTs and translating that another language? I understand that the ASTs are a "necessary" first step, but I don't see how you can build a translator, let alone a good one, without resolving names to scopes and types. And that's hard for most languages, and harder for the modern popular ones (including being an outright monstrosity for C++ and worse than you'd expect for Java with generics). None of the "parser generator tools" including ANTLR offer much help here. DMS language front ends provide this where we've had the energy to do it (Java through 1.6, C++, ...) – Ira Baxter Nov 02 '11 at 17:27
  • ... as a general rule, an AST isn't enough to manipulate a language (let alone translate it). See http://www.semanticdesigns.com/Products/DMS/LifeAfterParsing.html. – Ira Baxter May 31 '12 at 03:36

1 Answers1

2

There are more Java tools besides ANTLR that can do this (JavaCC is a popular alternative, to name just one).

Using a parser generator to solve this problem, you'd need to do the following:

  1. define a grammar which the parser can interpret and generate a lexer and parser (in your case, you need 3 grammars for your 3 languages);
  2. iterate over the AST your parser created, and output plain text (XML, in your case);

Grammars for Java, C# and C are available on ANTLR's Wiki, I'm sure readily available grammars exist for JavaCC (and other parser generator tools: Google is your friend here). But be aware that it is a Wiki, and many grammars are in an experimental state, or contain errors.

You could just skip step #1 and find existing parser that construct the AST for you. You only need to walk the AST yourself and create an XML from it. Here's a Java 5 parser, for example (for the other ones, again, Google is your friend).

Good luck.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288