5

[TL;DR: the following JVM bytecode instructions seems not to work:

iconst_0
istore 6
...sequential
iinc 6 1
jsr L42
...
; L42
iload 6
ifeq L53 ; Always branches!!!
astore 8
iinc 6 -1
; L53
LDC 100
ISUB     ; ERROR, returnAddress is at the top of the stack

A test .class can be found here (with slightly more complex logic). If you want to know more about why I'm seeing these instructions, please keep reading.]

I'm writing a Whitespace compiler targeting JVM bytecode. Although being an esoteric language, Whitespace describes an interesting set of assembly instructions to a stack machine, which maps nicely to the JVM.

Whitespace has labels, which are both targets for jump (goto/jump-if-zero/jump-if-negative) and function calls. The relevant instructions (with names given by me, in the spec they are given as combinations of space, tabs and newlines) are:

  • mark <label>: sets a label for the following instruction
  • jump[-if-neg|-if-zero] <label>: jumps unconditionally or conditionally to the given label
  • call <label>: call a function pointed by label
  • end <label>: ends a function, returning to the caller.

My compiler outputs the whole Whitespace program in the main method of a class. The simplest way to implement call and end is using the JSR and RET opcodes, which are made to implement subroutines. After a JSR operation the stack will contain a returnAddress reference that should be stored in a variable for later use in end.

However, as mark can be either call-ed or jump-ed into, the stack may or may not contain the returnAddress reference. I decided to use a boolean variable (call-bit, at address 6) to store how the mark was reached, and then test if it should store the top of the stack into a local variable (return-address, at address 8). The implementation for each instruction is as follows:

; ... initialization
iconst_0
istore 6 ; local variable #6 holds the call bit

# call
iinc 6 1 ; sets the call bit
jsr Lxxx ; jumps to the given label, pushing a returnAddress to the stack

# mark
; Lxxx
iload 6       ; loads the call bit
ifeq Lxxx-end ; SHOULD jump to mark's end if the call bit is not set
; call bit is set: mark was call-ed and returnAddress is in the stack
astore 8      ; stores returnAddress to local variable #8
iinc 6 -1     ; resets the call bit
; Lxxx-end

# end
ret 8 ; returns using the stored returnAddress

The problem: ifeq ALWAYS branches. I also tried reversing the logic (call-bit -> jump-bit, ifeq->ifne), and even simply switching to ifne (which would be wrong)... but the if always branches to the end. After a call, the returnAddress stays in the stack and the next operation blows up.

I've used ASM's analyzer to watch the stack to debug all this, but have just asserted this behavior and can't find what I'm doing wrong. My one suspicion is that there's more to iinc, or to ifeq than my vain philosophy can imagine. I'll admit that I've only read the instruction set page and ASM's pertinent documentation for this project, but I'd hope that someone can bring a solution from the top of their mind.

In this folder there are the relevant files, including the executable class and the original Whitespace, as well as the output of javap -c and ASM analysis.

Bruno Kim
  • 2,300
  • 4
  • 17
  • 27
  • In your first snippet you jump with `ifne` but in your second snippet, with otherwise identical code, you jump with `ifeq`. Is that intentional? `ifeq` seems to be the right opcode for what you want, so why is it not in the first snippet? – Erwin Bolwidt Apr 03 '15 at 07:13
  • Thanks for noticing, fixed the first snippet – Bruno Kim Apr 03 '15 at 07:14
  • Second question: how do you know it always branches? Are you single-stepping through the bytecode? Otherwise you're probably observing in some indirect way - which is also where the problem could be. – Erwin Bolwidt Apr 03 '15 at 07:14
  • I'm following the execution shown in the ASM analysis, that shows the state of the stack and variables for each opcode; and have also tried with printf-oriented debugging, printing which branch the code took. Additionally, ASM shows which instructions were not visited with a ?. But yes, a bytecode single-stepping would be a nice tool to use. – Bruno Kim Apr 03 '15 at 07:17
  • I was thinking of weirdness because of the RET instruction but not so likely. Can you emit bytecode the print the value of local variable 6 just before the `ifeq`? Maybe your increment/decrement is not returning it back to zero. – Erwin Bolwidt Apr 03 '15 at 07:30
  • Partially done, I've added to the folder. However, I can't see the output, as the JVM barfs a VerifyError and outputs nothing. Before, I was exiting the method just before an error. Trying to do the same now. – Bruno Kim Apr 03 '15 at 08:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74397/discussion-between-bruno-kim-and-erwin-bolwidt). – Bruno Kim Apr 03 '15 at 08:11

1 Answers1

1

Found a possible reason: the problem is not during execution, but with the verifier. When it seemed that it "always branched", was in fact the verifier testing all possible outcomes of an if so it could be sure the stack would look like the same. My code relies on a reference (returnAddress) maybe or maybe not being present on the stack and the verifier can't check that.

That said, the sample code does not run with the -noverify flag, but other, simpler examples that failed verification did execute correctly.

Bruno Kim
  • 2,300
  • 4
  • 17
  • 27
  • 1
    Keep in mind that you are riding a dead horse here. Starting with class file version 51 the instructions `jsr` and `ret` are no longer supported. Not wanting to verify code that may be both, *jumped* and *called* to, was the motivation to remove that feature… – Holger Apr 08 '15 at 14:46
  • Thanks for the input, I'm reworking my compiler to not depend on those instructions. I embarked on this thinking that JVM bytecode was simple and zen, and boy how much I'm learning! – Bruno Kim Apr 08 '15 at 15:50