Antlr4 language translation - separating template logic from visitor class?

Question

I’m looking at pragmatically translating huge amounts of relatively simple TSQL code to Groovy code. There are a number of reasons sure, but the driving reason is just to see if it can be done and in the process learn about compilers/grammers/ etc.

Antlr4 seems like the ideal tool for this problem (Java is a plus).

Tokenizing / parsing the TSQL (using a grammar file), and reading the tree using the generated Listener/Visitor is pretty straight forward.

I know I could just create the string representation of the Groovy code inside of my inherited visitor, but coupling the matching Groovy token values with my TSQLVisitor doesn’t seem like the cleanest solution.

What would be considered the best practice here? and in general for mapping one language to another in Antlr4 ?

Things I’m considering:

Using StringTemplate, and defining my groovy code in an STG file (my TSQLVisitor would use these templates and return the full string representation of the Groovy code).
Switch to Antlr3 which supports adding StringTemplate logic directly into the Grammar file.

You want to know about best practice? Or best practice using ANTLR? IMHO these are very different. — Ira Baxter, Mar 23 '15 at 01:58
Both, and certainly if Antlr4 is the wrong tool for the job. — vicsz, Mar 23 '15 at 02:01
I think this is what you want: http://stackoverflow.com/a/29196058/120163 — Ira Baxter, Mar 23 '15 at 02:28
Some good high level background information on the link -- including that the thing that I'm working on is called a "transpiler", but I was looking more for technical best practices -- i.e. where does the mapping get defined. Also, since this is more of a pet project, I would prefer to stick to Open Source tools. — vicsz, Mar 23 '15 at 03:20
There are several links on that page. Best practice is "don't use just a parser generator". Unfortunately that's mostly what *is* available open source. The only "not-just-parser-generator" that is open source that you can get is TXL, Stratego/MPL. Be prepared for a steep learning curve. (The learning curve you get with just a parser generator is even steeper; it includes discovering why that isn't enough, but not without tearing your hair out first trying to figure out how to do the job with just that.) — Ira Baxter, Mar 23 '15 at 03:34
Regarding where is the mapping defined: *if* you use "just a parser generator", you'll define the mapping (no choice) as part of a tree walk of the AST. And you're going to discover that does some easy things easily, and none of the hard things easily, which is why you want all that other machinery. With the kind of tools I think are best practice (DMS, TXL, ...) you define the mapping as rewrite rules. Each system is different in detail. Fine details on ours: http://www.semanticdesigns.com/Products/DMS/DMSRewriteRules.html — Ira Baxter, Mar 23 '15 at 03:38

score 3 · Accepted Answer · answered Mar 23 '15 at 18:50

Best practices depends on your objective. Where the conversion must not introduce or must minimize any added technical baggage or performance or maintenance overhead, then Ira's comments control.

If, however, performance and maintenance are not essential issues, the conversion is close to 1:1 semantically, and you have the ability to add run-time support code in the target environment, then an Antlr4 style conversion becomes possible. Of course, the greater the semantic differences between the source and target languages the more difficult it becomes - the size and complexity of the target run-time support lib becomes counter-productive. And, it only takes one deep-seated difference to drive a requirement for an analysis tool like Ira has developed.

Presuming an adequate Groovy lib has been developed, production of the target code is reduced to near one-liner's called for from the visitor onEntry and onExit routines. Coupling can be reduced somewhat by abstracting the rendering:

public class Render {

    private static final String templateDir = "some/path/to/templates";
    private STGroupFile blocksGroup;
    private STGroupFile stmtGroup;

    public Render() {
        blocksGroup = new STGroupFile(Strings.concatAsClassPath(templateDir, "Blocks.stg"));
        stmtGroup = new STGroupFile(Strings.concatAsClassPath(templateDir, "Statements.stg"));
    }

    public String gen(GenType type, String name) {
        return gen(type, name, null);
    }

    /**
     * type is an enum, identifying the group template
     * name is the template name within the group
     * varMap contains the named values to be passed to the template
     */
    public String gen(GenType type, String name, Map<String, Object> varMap) {
        Log.debug(this, name);
        STGroupFile stf = null;
        switch (type) {
            case BLOCK:
                stf = blocksGroup;
                break;
            case STMT:
                stf = stmtGroup;
                break;
        }
        ST st = stf.getInstanceOf(name);
        if (varMap != null) {
            for (String varName : varMap.keySet()) {
                try {
                    st.add(varName, varMap.get(varName));
                } catch (NullPointerException e) {
                    Log.error(this, "Error adding attribute: " + name + ":" + varName + " [" + e.getMessage() + "]");
                }
            }
        }
        return st.render();
    }
}

@GRosenburg has it right: if your language maps nicely everywhere to the target, and you don't care about code efficiency, you can do "on-the-fly" (or on-the-tree-walk) code generation. The problem is that you, the translator designer, have to decide in advance if this is true. If you decide so, and you are right, then this kind of translator is practical. The real problem is you are making a bet that you are right. If you turn out to be mistaken, your whole translator implementation has to be replaced, and that's a pretty high cost. So, you can bet and pray. Or, use strong foundations. — Ira Baxter, Mar 23 '15 at 19:14
@GRosenburg makes a crucial point: you can narrow the semantic gap between your source language and your target language, by building additional infrastructure in your target environment to simulate the original langauge semantics. (If you go far enough this way, you'll end up building what amounts to an interpreter for the original language). This is the only good way that you can minimize the chance of a semantic gap you can't overcome, IMHO. — Ira Baxter, Mar 23 '15 at 19:23

Antlr4 language translation - separating template logic from visitor class?

1 Answers1

Linked