Compiler-generated relative addresses and how they are represented in (preferably java) bytecode?

Question

When address binding is not possible at compile time, it's done at load/link or runtime, to associate relative ( or perhaps we can call them relocatable addresses ) addresses with actual physical ones. Plus, the CPU also converts those relative addresses to logical ones prior to binding for physical addresses.

Converting from logical to physical is a known concept to me. But, I got confused about those relative addressing ( AFAIK, they called relative because they're given/assigned relative to zero by the compiler ). I'm not sure what relative addresses are used for ( in a bytecode ) or if they're really needed, or they are even identical to logical addresses?

You shot the question in the foot by introducing RELOCATABLE. That is an independent concept that is way beyond your grasp at this point. — Bruce David Wilner, Aug 16 '16 at 14:38
@BruceDavidWilner As far as I remember. There might be two types of address binding. Relative to logical, and logical to relative. Ans sometimes relative address and relocatable address terms are used interchangeably. That's why I used that term. — stdout, Aug 17 '16 at 06:29
@BruceDavidWilner But I agree your point. Relocatable more refers to yet unresolved addresses ( if you meant so ). — stdout, Aug 17 '16 at 08:04
In JVM, branch addresses are PC-relative (i.e., an offset from the branch instruction location). Of course, by the time the method is JITed, those relative addresses do not have any physical meaning at all. — SK-logic, Aug 18 '16 at 08:09
@SK-logic I'm not sure they're bound right after they are JITed. The addresses should not still be physical ( but logical ). IMO, they should have mapped by MMU, not JVM. — stdout, Aug 18 '16 at 09:34
HotSpot may sometimes backtrack on JIT, so a version with a bytecode and bytecode-relative jump targets will persist anyway. The actual x86 jump instructions used in the generated code are also PC-relative, while call addresses are absolute. — SK-logic, Aug 18 '16 at 09:51

score 1 · Answer 1 · answered Aug 16 '16 at 14:38

1

Java bytecode operates at a much higher abstraction level than native machine code. There's no notion of memory addresses at all - methods are referred to symbolically.

The easiest way to think of Java bytecode is that it is practically 1:1 with the initial version of the Java language. The compiler does some things like converting local variables into numerical indexes and converting control flow into gotos, but for the most part, it is very similar to the original code.

The JVM is responsible for interpreting or compiling the bytecode into native code at runtime.

answered Aug 16 '16 at 14:38

Antimony

37,781
10
100
107

Thanks. I'd actually got the concept of JVM abstraction and symbolic references. But I'm trying to get a bit more deeper. First of all, what exactly are those relative addresses in the bytecode? For example, can constant pool references ( #3, #20 etc. ) or bytecode array indexes ( the ones at the left hand side of instructions 0:, 2: 4:) be given as an example? Secondly, JVM execution engine is converting the bytecode to native/machine code beforehand. Are those nativecode directly executable or should be filtered out MMU to bind them to their actual physical addresses? – stdout Aug 17 '16 at 07:27

score 1 · Answer 2 · answered Aug 16 '16 at 14:40

1

Getting the memory addresses of objects is actually pointless within Java: as the JVM is managing all of that.

In other words: the JVM "puts" objects wherever it fit they should be; and they can even be "moved" around; for example during garbage collection.

In other words: as a Java programmer, you don't care. And if you would care; there is nothing that you can do about this.

answered Aug 16 '16 at 14:40

GhostCat

137,827
25
176
248

Yes, I care about it:). And I don't want to do any extra stuff. Just trying to understand what's going under the hood. – stdout Aug 17 '16 at 07:48
1

Then you will probably have to do a very deep dive into the implementation details of a JVM implementation. – GhostCat Aug 17 '16 at 07:49

score 1 · Accepted Answer · edited May 23 '17 at 12:08

You are mixing up a lot of concepts. A relative address is just an address that needs a base address to be converted to an absolute address. That conversion can happen in a lot of ways. One way is converting them at load time, but they may also just be used together with CPU instructions which intrinsically support relative addressing doing the math right when the memory location needs to be accessed.

If an operating system supports virtual memory, all addresses used within an ordinary process are logical ones, whether they are referenced relative or absolute. The conversion from logical to physical addresses is outside the application’s scope and independent to any other concept you are referring to in your question.

The class file format does not operate in terms of memory locations.

If you want to apply the terms “absolute” and “relative” on that higher level, constant pool indices are absolute as they don’t require a base index to identify the actual index. Still, when you want to find the memory location within the loaded file, you not only have to use the address to which the class file was loaded, you also have to parse the entire constant pool up to the desired item, as constant pools have different byte sizes. For that reason, items are usually not looked up at all. Instead, the entire pool is converted to a JVM specific representation having constant item sizes in a single pass and later on, the JVM might look up entries of that table, which is independent of the class file’s memory location, instead.

Within byte code instructions, relative offsets are used, which require adding the current instruction’s position to get an absolute position, but note how this doesn’t fit into the concepts named in your question. The absolute positions are still positions within an instruction sequence and hence, relative to the memory location of the code when talking about addresses. Further, the relative offsets are not used because “binding is not possible at compile time”, the resulting absolute positions are known at compile time. The Java byte code instruction set is just defined to use relative offsets to allow more compact code. From an instruction set’s perspective we could say that it intrinsically supports relative addressing. How the JVM actually implements its execution, is up to the JVM.

Since you mentioned the JVM’s native code generation, when a JVM generates native code, it knows the target address of the code and can freely decide to use relative or absolute addresses, just as it fits.

As already mentioned, everything described above happens within one process, so if the operating system uses virtual memory, it’s all in terms of logical addresses which might be adapted by the operating system, e.g. via MMU. These concepts are unrelated.

I've read before about the relationship between a process, and its logical and physical address spaces. Thanks for the reminder. More specifically, I'm trying to read a decoded class file ( gen. by the compiler ) and seeing plenty of numbers referring to constant pool, local variables array, bytecode array etc. So, inevitably, I'm trying to associate those numbers ( relative offsets, if you like ) with their logical counterparts, assuming they've not bound yet by the compiler. And, from your views above, and as others said, it's done by JVM. — stdout, Aug 19 '16 at 10:40
Plus, I was referring absolute addresses as physical addresses and got your point of saying " this doesn’t fit into the concepts named in your question" phrase. — stdout, Aug 19 '16 at 10:42
It’s unclear what you mean with “not bound yet by the compiler” when referring to numbers in a class file. They all have been bound by the compiler as it actually was the compiler which assigned meaning to them. It’s getting even more strange when you say that you are looking at *decoded* class files as then, it was the decision of the decoder to show you a number instead of the actual item (talking about, e.g. constant pool items). A class file is [just a file](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html) which you can parse without knowing what a JVM will do with it. — Holger, Aug 19 '16 at 15:24
For the first unclarity, I said that way because I was referring to physical addresses when saying "not bound yet by the compiler". For the second remark of yours, I just used "decoded" to refer human-readable form of it ( disassembled one with javap ). I didn't want to involve or touch anything about who and how is decoding it. Hope it's far clear. — stdout, Aug 22 '16 at 14:09
Like any other ordinary application, a JVM never has to deal with physical addresses. It’s still not clear why you try to “*associate those numbers … with their logical counterparts*” as `javap` already does that for you. It prints both, the numbers and their meaning right in the same line. Well, not for the relative branch offsets, but adding the number at the beginning of a line to the number at the end of the line, is not a big deal. — Holger, Aug 22 '16 at 14:38
Well, I'm not trying map or associate those in a software/program. I just want to see the big picture and understand the mapping between relative and logical addresses. It's more clear now. Thanks for the discussion and your time. — stdout, Aug 24 '16 at 14:27

Compiler-generated relative addresses and how they are represented in (preferably java) bytecode?

3 Answers3