How to translate Three Address Code(TAC) to Java Bytecode?

Question

I would like to translate a plain Three Address Code file to Java Bytecode. There are some questions related to this topic already, but either they are not answered properly or the question goes way beyond what I'm looking for.

Take for instance this segment of code, generated with the front end of the compiler available in the "Dragon Book":

L1:L3:  i = i + 1
L5:     t1 = i * 8
        t2 = a [ t1 ]
        if t2 < v goto L3
L4:     j = j - 1

How would it look like in bytecode? Do I need to reconstruct the symbol table to do the translation? It would be really helpful if someone could describe it like blackcompe did in this answer(I know JVM is a stack machine, not a register one).

score 2 · Accepted Answer · answered Nov 10 '17 at 04:57

2

Here's how I would write your code in bytecode. But this is just one way to do it, and the question is pretty open ended. I'm assuming that all the variables are ints except for a. If they are different types, the required code would obviously look different.

; assume i, j, a, and v are in slots 0-3 respectively
L3: 
iinc 0 1
iload_0
bipush 8
imul
; store t1 in a variable for simplicity - you could simplify the code by eliminating the temporary
istore 4
aload_2
iload 4
iaload
istore 5
iload 5
iload_3
if_lt L3
iinc 1 -1

As mentioned, this is a pretty open ended question though. For example, the above code explicitly stores the temporary variables into local slots aka "registers" in order to match the code exactly. But you could simplify the code by rearranging things to avoid the temporaries as shown below

; assume i, j, a, and v are in slots 0-3 respectively
L3: 
iinc 0 1
aload_2
iload_0
bipush 8
imul
iaload
iload_3
if_lt L3
iinc 1 -1

answered Nov 10 '17 at 04:57

Antimony

37,781
10
100
107

Thanks for your answer! I see it is not as straightforward as I though it would at first. Do you think there is anything already implemented on this(considering that the "dragon book" is pretty old by now)? I also see you are the creator of Krakatau. Do you think your tools could help me in any way to create a translator from TAC to Bytecode? – joaofbsm Nov 10 '17 at 05:27
@Joao Martins: actually, it is *very* straight-forward. But even the first variant of this answer is already an optimized variant of what a simple (not to say naive) translator would produce. It’s just that each TAC instruction maps to multiple instructions, but still, the resulting byte code can be more compact than a TAC representation (that allows the same amount of variables). Don’t be misguided by the assembly representation, which is fine for human communication, whereas no tool would use this textual representation as intermediate step… – Holger Nov 10 '17 at 11:22
@Holger, thanks for your comment, it was very clarifying. What source materials I could look at to help me build this naive translator? Do you know any similar implementations? Also, about the symbol table, do I need to reconstruct it to do the translation? – joaofbsm Nov 10 '17 at 13:04
2

@Joao Martins: local variables are addressed using an index only, so all you have to do, is to assign an index to each symbol. Then, every TAC instruction of the form `a := b op c` gets translated to `[load b], [load c], [op], [store a]` (using the indices of `a`, `b`, `c`). See also [JVMS§2.6.1](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.6.1) and [JVMS§6](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html). For symbols that are assigned exactly once, you may skip reserving a local variable and inline their defining expression into the use site. – Holger Nov 10 '17 at 13:38
Thank you very much for your time guys, you are awesome! – joaofbsm Nov 10 '17 at 13:49
@Holger, technically you can in some cases eliminate variables that are used multiple times as well using `dup`, `swap`, etc. But such stack juggling is more trouble than it's worth, and bytecode size is usually not important. – Antimony Nov 10 '17 at 20:12
@Antimony, there is one thing bothering me: If I translate my TAC to Bytecode, after that how can I run it with JVM? I know that I can assemble a Jasmin file to .class bytecode with your tools(Krakatau), but that doesn't seem possible with human readable bytecode(**.bc**). Should I use Jasmin then? – joaofbsm Nov 11 '17 at 05:14
Jasmin/Krakatau IS a human readable format for representing bytecode. There are may be others, but I don't keep track (I've never heard of .bc files). – Antimony Nov 11 '17 at 07:38
@Antimony Thanks again! – joaofbsm Nov 11 '17 at 12:57
2

@JoaoMartins: My recommendation is to forget about generating a textual representation that has to be parsed again. Just generate *bytecode*, i.e. a class file, in the first place. – Holger Nov 12 '17 at 22:16
@Holger, I thought about this at first, but I ended creating a translator for the extended Jasmin syntax used by Antimony's Krakatau. It is already implemented and works for the entire language described in the Dragon Book. You can see its code here https://github.com/joaofbsm/smallL/tree/master/code/translator . It is not optimized, obviously, and, for the time being, I'm setting the max stack and locals array size to a predefined size. As the language doesn't have function calls, it is pretty easy actually, and all I need to do is to write a single main function. Thanks anyway :D – joaofbsm Nov 12 '17 at 22:21

How to translate Three Address Code(TAC) to Java Bytecode?

1 Answers1

Linked