7

Suppose, I have a program written in C and I have two identical computers, the one is running Windows and the other is linux. Since the computers are identical, their processors have the same instruction set, so the machine code after compilation should be the same. So why do I need to compile my program twice? Suppose, I dont call any OS-related function, or something that depends on the actual OS.

dnnagy
  • 3,711
  • 4
  • 24
  • 34

1 Answers1

9

Machine code does not depend on OS, it's same for the same CPU.

If you did OS agnostic piece of machine code, in target CPU mode (let's say x86 32b), and load that into some ROM memory, so it will be available, you can map that part of ROM both in Windows and in linux (by completely different OS API to map physical memory and give it executable rights), and jump there.. and the machine code in ROM would run the same way.

So why do I need to compile my program twice? Suppose, I dont call any OS-related function, or something that depends on the actual OS.

You don't have to. But usually you want some entry point into your code, and usually the simplest way how to provide universal entry point is to follow the OS defined ABI (Application Binary Interface), so for example in 32b windows you read arguments from stack, and in 64b linux you receive arguments in registers (when possible). If you wouldn't adjust your procedure prologue code to pick arguments in correct way, it will operate with wrong inputs in the "other" OS than it is written for.

But the machine code itself, the CPU instructions, are same.

That said, on x86 the situation is a bit more hairy due to historical backward compatibility, so the CPU can be in 16b mode, 32b [protected] mode (couple of them plus differently set up), or 64b mode. The 80386 CPU instruction mov eax,1 has different machine code encoding for 16b mode, and for 32b mode.

But as long as you are targetting the same CPU mode, the machine code of the same instruction is compiled in the same way. You just write source differently to follow different ABI.

And the executable files... each format is different, it's not even "per OS", again due to historical reasons almost all x86 OS support several executable file formats, so the meta data around machine code stored in the file (to be used by OS during loading machine code into memory and setting it up for run) are completely different.

Practical example is linux app wine, which can execute windows executables, by providing fake OS hook points to simulate windows OS, and by understanding windows executable binaries, so correctly loading them into memory. The machine code of such windows application is run natively, without any further patching.

Ped7g
  • 16,236
  • 3
  • 26
  • 63
  • 2
    One more note. Assemblers usually produce "object files", which are again in toolchain-specific format, so Microsoft Visual Studio is using different ".obj" files to store the same assembled machine code, than gcc on linux producing ".o". The machine code part of object file is same, but meta data allowing linking of such file may be completely different (plus different format of debug meta data, etc). So that's another reason, why you must compile the same source several times, but it's not per-OS, but per-toolchain. – Ped7g Dec 15 '16 at 00:15
  • 2
    Also, ABI differences in type sizes: `long` is 32 bits on x86-64 Windows, but 64-bits in the System V x86-64 ABI. So the same C struct can mean different things when compiled for different ABIs, not to mention single variables being different sizes (and thus needing different operand-size in the machine code, and different stack layout for locals, etc.) – Peter Cordes Dec 15 '16 at 03:40
  • 2
    It is worse than "per-toolchain". It is per-version-of-toolchain. Binary compatibility of intermediate files is not guaranteed in perpetuity for all tools from a particular vendor. I'm not sure if the GCC folks try to provide backwards compatibility here, but Microsoft explicitly does not. (Which is rather ironic, actually.) The only thing that works cross-toolchain is the debugging information, since that has a standardized format. But, of course, the best thing about standards is there are so many to choose from: COFF, ELF, CodeView, PDB, ... :-) – Cody Gray - on strike Dec 15 '16 at 12:39
  • So if you strip around the metadata (specific to OS) in executable file is the machine code same on any OS if same CPU architecture is used? even if the program contains some system calls? – Charan Sai Feb 01 '22 at 16:48
  • @CharanSai if the program contains some system calls, that's also machine code, and that part is different per OS. Everything non-OS related can be same, but if you will learn a bit of assembly, you will see there are [different] OS conventions even down to how different parts of code pass arguments between themselves. So even without system calls the resulting machine code may slightly differ, but for CPU itself it's same instruction set. That's other way to think about it, the CPU has no idea what OS is running and OS does not modify CPU instructions, just use them differently. – Ped7g Feb 02 '22 at 06:52
  • @CharanSai you can even execute the code for different OS and it will run up to the system call, which is also valid instruction, and will get executed. The difference is what happens then, on the correct OS there's configuration causing the CPU to land to valid system code, and providing the desired service, on other OS there's probably just configuration to catch the illegal call by misbehaving app and report to user it did try something weird and stop its process. But the instruction itself got executed the same way by CPU, just configuration of where it leads is different as that is per OS – Ped7g Feb 02 '22 at 06:59
  • @Ped7g any system call would eventually come down to same assembly instruction if CPU is same right? The API function might be different like fork in linux, CreateProcess in windows but it would assemble to `syscall` or `sysenter` or `int 80h` based on CPU right. So why would depend on OS then? – Charan Sai Feb 02 '22 at 12:55
  • @CharanSai no. You can create OS which will hook its system services on different mechanism, there's nothing (except convenience and common sense) forcing you to use `syscall` as system call entry in x86_64, you can for example require every user app to call some absolute 64b address, etc... up to you as OS author to decide what is the convention on your OS. Up to SW authors to follow it, if they need OS call. IIRC all current x86 OS have some difference one way or another, even if using same instruction, the register-arguments ordering is different, etc. – Ped7g Feb 03 '22 at 16:07
  • @CharanSai just to be super clear.. it's not just different name of API function still compiling to same assembly. (made up example, didn't check real ABI) for example linux "do_something(int value)" may compile to rax=12345 ("do_something" service number), rbx=value and `syscall`, while windows may have identical functionality under "r12=34567" ("do_something"-like service) and rax=value, then `syscall`. So you need almost identical, but different machine code for each OS, respecting different scheme for arguments passing. – Ped7g Feb 03 '22 at 16:14