2

Assume this simple code:

int main(){return 0;}

using objdump we can see the memory addresses:

0000000100003fa0 _main:
100003fa0: 55                           pushq   %rbp
100003fa1: 48 89 e5                     movq    %rsp, %rbp
100003fa4: 31 c0                        xorl    %eax, %eax
100003fa6: c7 45 fc 00 00 00 00         movl    $0, -4(%rbp)
100003fad: 5d                           popq    %rbp
100003fae: c3                           retq

I know that 0x100003fa0 (as an example) is a virtual memory address. The OS will map it to the physical memory when my program is loaded.

2 Questions:

1- Can the initial address of main be random? as they are virtual I'm guessing it can be any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?

2- How does the linker come up with the initial address? (again is the starting address random?)

Dan
  • 2,694
  • 1
  • 6
  • 19
  • That is a base address + offset, and the former can typically be set as part of the linker configuration when building the image. At one time to speed load times for projects consuming dozens or hundreds of loadable modules it is not uncommon to "rebase" the modules to ensure they each load in a virtual space not-occupied by anyone else (and thus not needing relocation at load). With the advent of [ASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization) that forefront preparation is all-but-worthless. You can still force it, but you'd better have good reason to do so. – WhozCraig Jan 31 '21 at 20:16

3 Answers3

2

Can the initial address of main be random? as they are virtual I'm guessing it can be any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?

The memory being virtual doesn’t mean that all of the virtual address space is yours to do with as you please. On most OSes, the executable modules (programs and libraries) need to use a subset of the address space or the loader will refuse to load them. That is highly platform-dependent of course.

So the address can be whatever you want as long as it is within the platform-specific range. I doubt that any platform would allow 0x1, not only because some platforms need the code to be aligned to something larger than a byte.

Furthermore, on many platforms the addresses are merely hints: if they can be used as-is, the loader doesn't have to relocate a given section in the binary. Otherwise, it'll move it to a block of the address space that is available. This is fairly common, e.g. on Windows, the 32-bit binaries (e.g. DLLs) have base addresses: if available, the loader can load the binary faster. So, in the hypothetical case of the "initial address" being 0x1, assuming that alignment wasn't a problem, the address will just end up being moved elsewhere in the address space.

It's also worth noting that the "initial address" is a bit of an unspecific term. The binary modules that are loaded when an executable starts, consist of something akin to sections. Each of the sections has its own base address, and possibly also internal (relative) addresses or address references that are tabulated. In addition, one or more of the executable sections will also have an "entry" address. Those addresses will be used by the loader to execute initialization code (e.g. DllMain concept on Windows) - that code always returns quickly. Eventually, one of the sections, that nothing else depends on, will have a suitably named entry point and will be the "actual" program you wrote - that one will keep running and return only when the program has been exited. At that point the control may return to the loader, which will note that nothing else is to be executed, and the process will be torn down. The details of all this are highly platform dependent - I'm only giving a high-level overview, it's not literally done that way on any particular platform.

How does the linker come up with the initial address? (again is the starting address random?)

The linker has no idea what to do by itself. When you link your program, the linker gets fed several more files that come with the platform itself. Those files are linker scripts and various static libraries needed to make the code able to start up. The linker scripts give the linker the constraints in which it can assign addresses. So it’s all highly platform-specific again. The linker can either assign the addresses in a completely deterministic fashion, ie. the same inputs produces identical output always, or it can be told to assign certain kinds of addresses at random (in a non-overlapping fashion of course). That’s known as ASLR (address space randomization).

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • To confirm, the virtual memory is essentially assigned to binaries during link time. If I wanted to use anything outside of that range during runtime I will get a segfault, is that correct? – Dan Jan 31 '21 at 22:59
  • No, the virtual memory is assigned to binaries during **loading** time, assuming you're on certain operating systems such as Linux, Solaris, Windows, OS X, iOS. And you'll only "get" a segfault if you want to: a "segfault" is when a certain CPU-generated runtime exception goes unhandled. Most "segfaults" are handled by the kernel and are responsible for bringing the pages of your binary from file storage into RAM. Linking will assign *relative* addresses within each section, and tabulate relationships between sections, so that the loader can the resolve all that when you run the program. – Kuba hasn't forgotten Monica Feb 01 '21 at 19:36
0

Not sure about Visual C but gcc (or rather ld) uses a linker script to determine final addresses. This can be specified using the -T option. Full details of gcc linker scripts can be found at: https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts.

Normally you don't need to play with this since your toolchain will be built either for the host machine or when cross compiling with the correct settings for a target.

For ASLR, and .so files you will need to compile with the -PIC or -PIE (position independent code or position independent executable). You compiled code will only contain offsets against some base address in memory. The base address then gets set by the operating system loader before running your application.

doron
  • 27,972
  • 12
  • 65
  • 103
0

Those addresses are base addresses and offsets. ELF file contains special information on how to calculate the actual addresses when the program is loaded. It is a rather advanced topic but how the .elf file is loaded and executed you can read here: How do I load and execute an ELF binary executable manually? or https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/

0___________
  • 60,014
  • 4
  • 34
  • 74