0

I'm a complete novice with bytecode VM's so forgive the basic question.

While digging into some resources to learn more about this stuff, I came across Terrence Parr's great video which goes over a stack VM example from his book "Language Implementation Patterns". I follow everything for the most part, but once the video gets to interpreting function calls I got confused about one specific thing. When we "jump" to a function definition within the code memory array, that compiled instruction set (from the function) is in the same "code" memory array as the rest of the code. Is the implication there that functions compile their bytecode to a separate location of the code array (like maybe after everything else)? If not, wouldn't you end up "running into" compiled function bytecode while incrementing through the code array?

For example, imagine the first few lines of the program are something like:

x = 10
define somefunc(x,y){dostuff}
y = 9
...more code ...

If during compilation we simply step through the source-code sequentially, we would end up compiling the function's bytecode in the middle of our "normal" code. Then while interpreting, the VM would "run into" that code.

The video doesn't cover compilation as that is handled by ANTLR, so I'm taking the VM at face value and trying to infer how the compilation part works (for this particular VM). I'm sure at some point further along in my learning I will understand the compilation piece better, but for now I'm trying to understand this specific example.

I could see it working two ways:

  • compiled function bytecode lives within each function itself, which is a VM level primitive. We look up the address in the constant pool and the function object which lives there has the compiled bytecode for that particular function. This would be different than what's in the video since the function bytecode would not be part of the main "code" array. I don't know if this is a common pattern, it's just one way I could imagine it working.
  • compiled function bytecode has it's own "space" in the code array, either before or after everything else. That code would never be reached while normally stepping through the code array, only if we explicitly jump to it.
Solaxun
  • 2,732
  • 1
  • 22
  • 41
  • 1
    Did you ever program in a language like Java? Every method has its defined end and there is no “step over into another method”. Further, .java files can be compiled to corresponding .class files individually. There is no insertion of invoked methods into the caller. That’s why programs written for Java 1.0 could still run with Java 18; the do not contain code of the Java 1.0 classes. If you consider other programming languages, like BASIC, there were interpreters not preventing the execution from running into code supposed to be a subroutine (which would fail when reaching RETURN). – Holger Apr 04 '22 at 08:51
  • @Holger Thanks, but I'm not sure how your answer relates to the question. Of course I understand that this doesn't happen in Java (or any other language). What I don't understand is how the specific bytecode VM example I linked to handles this, since it uses the same array for both compiled functions and other compiled code. I assume the compiler (not shown in the video) has some way of "sectioning off" a portion of the array as reserved for functions - otherwise wouldn't incrementing the program counter result in running into them? – Solaxun Apr 04 '22 at 16:17
  • 1
    I’m not sure, if anyone is willing to watch this long video entirely, just to find out what “the specific bytecode VM example” does different to other VMs (if there are any differences). In practice, a lot of real life VMs are written in languages, where *all* arrays are within the same memory, without boundary checks, anyway. So, these VMs have to ensure not to run over the end of one code segment by program logic, just like any other program. They should know, how long a method’s code is. If the method does not end properly, they may reject the code (mandatory for Java) or crash arbitrarily. – Holger Apr 04 '22 at 16:28
  • @Holger I timestamped the specific section. I understand how a method avoids "running into" other code *once called*, in that it has clear boundaries around how many args to pop off the stack, where to return (jump) to based on the saved frame pointer etc. What I don't understand is how, prior to any function/method invocations whatsoever, the VM avoids incrementing the program counter to a section of code that should be a function, since all code is in the same array. – Solaxun Apr 04 '22 at 16:35
  • 1
    Why should the VM ever increment the program counter without an actual execution at all? – Holger Apr 04 '22 at 16:38
  • @Holger As part of normal execution? The PC increments some number of bytes depending on the last instruction encountered. In my silly example above, the first instruction would be something like "load 10, gstore x". Maybe that's 4 bytes. The VM executes that code, then moves the PC to the next instruction. What if that next instruction is function code, which should only be reachable by an explicit jump (e.g. as part of a call)? Since it's next in *source-code* order, I assume the compiler would have to somehow make sure to reserve part of the code array for functions only to avoid this. – Solaxun Apr 04 '22 at 16:47
  • 1
    You said “prior to any function/method invocations”, now you’re saying “normal execution”. Maybe, that’s the problem. I don’t know of any VM that has an execution that is not part of of a function or method invocation. In fact, even native code follows the pattern of starting its execution with a function invocation. Like the well known `main` function or method. There is no execution “prior to any function/method invocations”. And each function has a well defined boundary. Or, maybe not. Who says that the VM of that video does protect against running into a another function’s code? – Holger Apr 04 '22 at 16:57
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/243592/discussion-between-solaxun-and-holger). – Solaxun Apr 04 '22 at 17:42

0 Answers0