-1

I've hit kind of a wall, trying to write a simple compiler in Java, using ASM. Basically, I am trying to add strings of characters together, and cannot work out why my code fails to do so. The problem lies with how the following lines of code compile:

char[] p;
p = "Hi";
p = p + i[0];

Where i is an initialized array. The line p = "Hi"; compiles as:

bipush 2;
newarray t_char;
dup;
bipush 0;
ldc h;
castore;
dup;
bipush 1;
ldc i;
castore;

Note that I am deliberately treating the string "Hi" as a char array, instead of directly as a String object. When decompiled, it reads as:

Object localObject1 = { 'H', 'i'};

And thus, as {'H', 'i'} is not a proper constructor for Object, the program does not execute. Now, my confusion, and the reason I came to stackoverflow with this is that when the line line p = p + i[0]; is removed from the program, or replaced with one not using an array, such as p = p + 5;, the line p = "Hi"; compiles, again, in the exact same way:

bipush 2;
newarray t_char;
dup;
bipush 0;
ldc h;
castore;
dup;
bipush 1;
ldc i;
castore;

And when decompiled, the same line reads as:

char[] arrayOfChar1 = {'H', 'i'};

The program runs just fine. I have absolutely no idea what is going on here, nor any about how to solve it. To decompile the .class files, I am using this decompiler. I would like to know why the exact same bytecode decompiles differently in these 2 cases.

Luke Sykpe
  • 311
  • 1
  • 11
  • What's the question? – shmosel Jun 18 '18 at 21:08
  • @shmosel I'd like to know why the exact same bytecode decompiles differently in these 2 cases. – Luke Sykpe Jun 18 '18 at 21:12
  • That would depend on the decompiler. You haven't even said which one you're using. – shmosel Jun 18 '18 at 21:13
  • Sorry about that. Edited. – Luke Sykpe Jun 18 '18 at 21:16
  • 1
    Uh, I'm still not clear on what you're doing, but you _cannot_, in Java, treat `char[]` and `String` interchangeably. You can't assign one to the other. Things in between `"` are `String`s and cannot be assigned to character arrays. This isn't something that generating your bytecode will go around. But if you're writing your own language where `""` are compiled into character arrays...I guess that could work? – Louis Wasserman Jun 18 '18 at 21:25
  • I am aware. That is not what I'm trying to do. I'm wondering why the `newarray t_char` bytecode instruction pushes, into the stack, an array reference in the second case, but not the frist one. – Luke Sykpe Jun 18 '18 at 21:28
  • But as far as "why the exact same bytecode decompiles differently," it's going to be because the variable being stored into has a different type, which is represented in a different part of the bytecode, not the implementation you pasted. – Louis Wasserman Jun 18 '18 at 21:28
  • You mean `p`? But that's declared right above the line I'm having trouble with, in both cases. As I produce no bytecode for declaring variables, the very first time that gets a value assigned to it should be `newarray t_char`. – Luke Sykpe Jun 18 '18 at 21:31
  • The type _created_ and the type _of the variable_ are different. The type of the variable is in a different place. – Louis Wasserman Jun 18 '18 at 21:35
  • In reply to your edited comment, yes, sorry that's exactly what I'm trying to do. The lines of code are in "my" language, not Java. In the compiler I'm writing, String literals are treated as `char[]`. Should've probably specified that beforehand. – Luke Sykpe Jun 18 '18 at 21:36
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173361/discussion-between-louis-wasserman-and-luke-sykpe). – Louis Wasserman Jun 18 '18 at 21:36
  • In addition to the instructions, there is meta-data about the variables used in each method. Is the local variable table declaring `Object` in the first case and `char[]` in the second? – erickson Jun 18 '18 at 22:36
  • There are no variables in the bytecode you have posted. You are asking a question about exactly those parts of the code you have not shown. – Holger Jun 21 '18 at 09:35

1 Answers1

0

In general, you can not expect to be able to recompile decompiled code. Compilation and decompilation are both lossy processes. In particular, bytecode does not have to contain explicit types like Java source code does, and the type checking rules for bytecode are much laxer than the source level type system.

This means that when decompiling the code, the decompiler has to guess at the type of local variables (unless the optional debugging metadata was included with the compiled class). In some cases, it guessed Object, which led to a compilation error. In other cases, it guessed char[]. If you want a more in depth explanation, you could dive into the decompiler source code, but the real issue is expecting the decompiler to magically give good results in the absence of type information in the first place.

Anyway, if you want to edit already compiled code, you shouldn't use a decompiler. Your best bet is to use an assembler/disassembler pair like Krakatau, which allows you to edit classfiles losslessly at the bytecode level (assuming you understand bytecode).

Antimony
  • 37,781
  • 10
  • 100
  • 107