4

Is it possible to decompile JVM languages like Groovy, Scala to their initial form?

If I try and decompile a 4 line Groovy class I get about 20 lines of decompiled Java code.

It's more of a theoretical question than a practical one, cause there are no such decompilers on the market (not that I know of).

Thanks.

Alexandru Luchian
  • 2,760
  • 3
  • 29
  • 41

3 Answers3

3

Yes, it's possible to a similar fidelity that a Java decompiler can manage (meaning: the code will look similar, but not necessarily identical).

You'd need a dedicated decompiler for each language, however.

Edit: I think I need to clarify what level of fidelity I'd expect:

  • The names of local variables may or may not be reproducable
  • Loop types might be mis-interpreted (for replaced by while, ...)
  • More general: Things that can be done in two similar ways might be mis-interpreted
  • ...

All of those are errors that also occur on decompiling Java code, simply because the association from byte code to Java source code is not 1:1.

However, if you have a dedicated Groovy decompiler, then I strongly suspect it to produce much more readable code from decompiling compiled Groovy code than a Java decompiler ever could.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • A very optimistic assumption, indeed. – Ingo Apr 20 '11 at 14:06
  • @Ingo: I don't see how this is optimistic. The code produced by a competent Java decompiler only bears basic similarity with the original source as well. Local variables names are usually lost (unless explicitly included in the `.class` files during compilation) and more often than not the specific type of loop used is mis-represented as well (replacing `for` loops with `while` loops). – Joachim Sauer Apr 20 '11 at 14:08
  • 1
    I alaborated nmy different opinion in my post. – Ingo Apr 20 '11 at 14:10
  • 1
    @Ingo: I don't think our opinions are too different. I simply interpreted the queston as "Can a dedicated Groovy decompiler be as useful as a dedicated Java decompiler?", to which I say: "yes, it can be equally useless" ;-) – Joachim Sauer Apr 20 '11 at 14:12
1

It is not necessarily possible. For example, a language could mangle it's names in a fashion that is not reversible. Also, it could map different constructs of the source language to a single java language construct.

Most impotantly, however, the java language (vs. the JVM bytecode) could not be powerful enough to encode certain concepts or constructs of the source language in such a way that they could be reified. This is already the case with Java and the JVM bytecode, where the latter is not capable of expressing generics.

Ingo
  • 36,037
  • 5
  • 53
  • 100
  • for your second point, I didn't mean to convert Java bytecode generated by Groovy back to Java, but back to Groovy. – Alexandru Luchian Apr 20 '11 at 15:32
  • @Alexandru - Thats exactly my point - I do not know Groovy, but can imagine that it has certain features that cannot be compiled to bytecode without loosing information. One example of a langhuage that looses information when compiled to bytecode is, ironically, Java itself, hence it is safe to assume that this will the more the case with other languages. – Ingo Apr 20 '11 at 17:26
0

Well, the only things that I can think of are the problems of compiler optimization and comments. Comments aren't preserved in the bytecode (thankfully) and the compiler may alter the source code for better performance, beside that it seems possible.

Kyle Sletten
  • 5,365
  • 2
  • 26
  • 39
  • I am not worried about comments. But how are you gonna restore something like this? this.metaClass = ((MetaClass)ScriptBytecodeAdapter.castToType(tmp12_9, $get$$class$groovy$lang$MetaClass())); tmp12_9; while (true) return; – Alexandru Luchian Apr 19 '11 at 21:15
  • 2
    @Alexandru: that's not the byte code. That's how a **Java language decompiler** tries to interpret byte code that was not produced by compiling Java language code. This is bound to produce strange results, but a dedicated decompiler for the correct language would recognize the "strange" bytecode construct and know what language-construct it represents. – Joachim Sauer Apr 20 '11 at 13:33
  • @Joachim, you are right that is not the byte code, that's why I didn't say it is. I guess it makes sense, if you know you are dealing with a Groovy generated class file you should know you have to deal it differently than a Java generated class file. – Alexandru Luchian Apr 20 '11 at 14:46