6

Consider the below program.

int a = 0x45;
int main()
{
   int i = a;
   return 0;
}

;; asm code
call   0x401780 <__main>
mov    0x402000,%eax   // why does it allocate 0x402000 only for global 'a'?
mov    %eax,0xc(%esp)
mov    $0x0,%eax
leave

This is the equivalent assembly code generated in CodeBlocks on Windows/xp. I understand 0x402000 is data segment address. But is that memory location hardcoded by compiler?

I think it is not hardcoded because that memory location may / may not be used by other applications too.

As we know, Operating system allocates Stack frame for local variables and returns base addess of stack frame. and local variables are accessed using %esp and %ebp registers with offset.

Does the operating system do the same for global variables? If it does the same why the value is hardcoded?

dw a 0x40; this directive allocates memory on data segment
mov %ax,a; copies value of a to accumulator

But how does compiler know 'a' has memory address 0x402000. If compiler has hardcoded the value as 0x402000 it should first make sure that address is not used by another application right?

If operating system allocates memory on datasegment, the memory address should be varied depending upon the applications and resources. Could anyone explain what really happens when I define global variables?

user3205479
  • 1,443
  • 1
  • 17
  • 41
  • 3
    You should read about virtual addressing. – The Paramagnetic Croissant Oct 19 '14 at 12:51
  • Does it matter to you? – Ed Heal Oct 19 '14 at 12:52
  • 2
    @EdHeal Yes, Iam trying to understand the internal architecture of C program execution and flow by comparing with asm :) – user3205479 Oct 19 '14 at 12:54
  • It depends on the OS/Compiler. So if you understand for one combination it is not true for another. Besides there is no internal architecure per se. All the compiler does is convert you code into binary for the platform/OS/CPU... – Ed Heal Oct 19 '14 at 12:58
  • 3
    Read Levine's book [linkers & loaders](http://www.iecc.com/linker/) – Basile Starynkevitch Oct 19 '14 at 13:03
  • 1
    The compiler picks the *offset*, the linker picks the base address, the operating system decides if it is available and relocation is required. The days that addresses are predictable are fast coming to an end, too dangerous, ASLR is becoming the rule. – Hans Passant Oct 19 '14 at 13:16
  • 1
    Your confusion is understandable. My guess is that no one has ever explained to you what a relocatable linker is, or does. Prepare your brain in advance for a real mind trip, and let me encourage you to have fun and be persistent. What you're asking is really fun stuff, and if you are into this, you're going to really enjoy software. Do not be discouraged or intimidated. Stick with it, and be tolerant of software nerds who don't have the written verbal skills to explain this in a single sentence. (It requires about 3 or 4 chapters in a book, really.) – User.1 Oct 19 '14 at 18:14

2 Answers2

5

As Prof Falken mentioned this depends on the compiler/system...but...Linux, Windows, Mac, popular/primary toolchains:

The compiler takes the high level source and makes assembly out of it, the assembler turns that into an object. The object resolves what relative addresses it can, but leaves clues for the linker.

The linker...links...it takes the objects, their binary blobs, arranges them into the binary address space it is told about, it picks the addresses for things like globals and functions. Basically it places .text, .data, and .bss.

Then there is the mmu in hardware, this has made life much simpler, you can for example compile every program for say address 0x8000 as an entry point, and have many many programs all running at address 0x8000 at the same time. Because they all think they are at that address because in the virtual address space on the virtual side of the mmu they are. On the physical side, they are all actually living at different addresses, but generally only the operating system needs to care about that.

So the compilers these days typically place functions in the order that we wrote them in the source code in the object, the .data and .bss items they sometimes rearrange on us. The linkers generally operate as they are told, and who tells them? Ultimately us, the programmers, but the toolchain provided to you has defaults (like automatically assembling the compiled code into an object and automatically linking) including the bootstrap code and a default linker script. That default linker script for that compiler for that target operating system is setup per the rules of that operating system.

The above is what you typically see with gcc and other primary compilers for the leading operating systems the windows, mac, and *nix. That doesnt mean there arent toolchains out there now that do something different compile straight to final binary, or assemblers that go straight to final binary and not object. Certainly historically it wasnt always this way either. Until you get into those corner cases I assume you are going to have the above experience as you dig into the tools.

old_timer
  • 69,149
  • 8
  • 89
  • 168
4

This depends on the operating system and the compiler.

For instance on the Amiga, if I remember correctly, absolute addresses were stored inside the executable file on disk. But when the OS loaded the binary, it would rewrite, on the fly, the addresses to fit into the memory area it had allocated for the program.

In your case, I think the addresses can be absolute in the 64k limit of a DOS "small" memory model program. 64k is a segment in the 8086 architecture and DOS would allocate a full segment for each "small" memory model program it loads. ".COM" files are loaded as is into a 64k DOS segment.

I may not get the terminology and details exactly right, but my main point is, that it depends on the operating system and compiler in question.

Prof. Falken
  • 24,226
  • 19
  • 100
  • 173
  • My question is. does an Operating System can understand binary code? How does it know all the occurences of global variables? which is in binary code. Instance code is 101010101 right. How does operating system figures out global variables that are in machine code form? – user3205479 Oct 19 '14 at 12:59
  • 2
    @user3205479 - Why do you think that the OS cares if a variable is global or not. It does not. Just switches pages of memory when requied. thos pages may contain the global variables... – Ed Heal Oct 19 '14 at 13:02