-1

I have been learning C++ for a while, but currently I know nearly nothing about assembly/machine language and how compiler and hardware work. Sorry if this question is really naive...

Consider the following very simple code:

#include <iostream>
using namespace std;
int main()
{
    int x = 0;
    cout << &x << endl;
}

In my current understanding, the first line in main() asks the compiler to reserve enough memory to hold an int, associate it with the identifier x, and put 0 into that memory location. And the second line in main() prints out the address of the start of that memory location.

I run the above code twice (consecutively), and I got different outputs as follows:

0000002EECAFFB84  // first time
0000007F1FAFF854  // second time

It is also in my current understanding that (please correct me if I have any misunderstandings):

  • when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware).
  • since I modified nothing between my first run and my second run, the compiler will NOT generate machine code again, and thus the machine code used in the second run is the same as in the first run.

If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.

Is such mechanism done directly by the hardware (e.g. CPU?)? Or is it done by the operating system (so can I say OS is a kinda "virtual machine" for C++?)? May I ask who determines the exact memory address of x and how?

May I also ask how exactly does the compiler generates machine code for &x at compile-time, so that the memory address which hasn't been determined yet can be ensured to be retrieved successfully at a later point in runtime?

CPPL
  • 726
  • 1
  • 10
  • 3
    You seem confusing "when I first run the program" and "when I first compile the program". – 273K Oct 02 '22 at 19:53
  • 4
    The answer to https://stackoverflow.com/questions/29692191/2-questions-regarding-aslr and https://en.wikipedia.org/wiki/Address_space_layout_randomization are good reading. – Retired Ninja Oct 02 '22 at 19:56
  • 1
    The machine code uses the stack pointer, making relative changes to it so things still work even with a randomized starting value on entry to `main` (or on entry to user-space CRT startup code). – Peter Cordes Oct 02 '22 at 21:26
  • 1
    *"It is also in my current understanding that"* -- the point of this section appears valid, but the explanation is not. Your explanation seems to have viewed the "build & run" functionality of your IDE as simply "run". More accurate: *When you **compile** the program, the compiler [does stuff]. After the program is compiled, you can run it. The first time you run the program, you get one result. The second time you run the program you get a different result, even when the program was not re-compiled.* – JaMiT Oct 03 '22 at 03:35

1 Answers1

0

when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware)

The compiler doesn't generate machine code when you run the program. It generates the machine code when you compile it. The compiler is also a program. C++ code is textual data. The C++ language is a standard. The compiler implements the C++ standard by writing code that can read the textual data of your C++ program and understand what it should do according to the standard. It will then write a file called an executable containing the machine code.

When you launch the executable, the desktop, which is also an executable but with a higher privilege, will use a system call to ask the os to create a new process which will run the code in that executable.

Assembly is also textual data. It is considered lower level than C++ because every line of code is almost on par with the CPU instructions (one line of code = one instruction) but not always. Assembly remains textual data that an assembler understands and can translate it to individual CPU instructions.

I don't think machine code is normally called object code. Object code normally refers to code that isn't yet linked. It means that, for some symbols that you call in higher level languages, the address to reach them isn't yet known. If the compiler cannot determine the address to reach for a certain symbol (like a function), then it will leave an unresolved symbol in the object code. For example, if you include an header and call a function in it, it contains only a declaration. The actual definition of the function is either in another object file or in a library. When you link your object files together, the linker looks at unresolved symbols and attempts to find them in the other object files you passed. If it doesn't find them, then it throws an error. Compilation is done in those two steps to allow for parallel compilation. Basically, your source code file doesn't need any other file to be compiled. It just needs that every symbol is declared so it can create an object file and leave unresolved symbols in it. Then, the linker patches them. It speeds up compilation by a lot because several threads can be used.

If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.

The memory address of x is not determined by the machine code but its relative position within the stack is. The stack's address is stored in a register called the stack pointer. The compiler doesn't know what is the address in advance and it doesn't care. It will access the content of the stack with a relative offset from the stack pointer register. This allows relative addressing for data local to your function.

When a function ends (if you call other functions from main let's say), the compiler puts an instruction to increment the stack pointer. The data that was there is still the same but the stack pointer is pointing above so, when the compiler accesses the stack relative to it, the data isn't in the way. The data is basically forgotten. If you call another function, then that data will most likely be overwritten by what this function initializes (its variables).

For data outside functions (global data), the executable contains a section called the data section which has room for it. The global data will thus have reserved space for the whole execution of your program.

user123
  • 2,510
  • 2
  • 6
  • 20