3

How does an application program know its entry point is the main() function?

I know an application doesn't know its entry point is main() -- it is directed to main() function by means of the language specification whatever it is.

At that point, where is the specification actually declared? For example in C, entry point shall be main() function. Who provides this mechanism to the program? An operating system or compiler?

I came to the question after disassembling a canonical simple "Hello World" example in Visual Studio.

In this code there are only a few lines and a function main().

But after disassembling it, there are lots of definitions and macro in the memory space and main() is not the only declaration and definiton.

Here below disassembling part's screenshot. I also know there is a strict rule in language definition which is only one main() function must be defined and exist.

To summarize my question: I wonder which mechanism directs or sets main() function as an entry point of an application program.

enter image description here

ecm
  • 2,583
  • 4
  • 21
  • 29
Nazim
  • 406
  • 1
  • 6
  • 20
  • 3
    There is a runtime library or a startup code which is linked with your main program and it knows it should jump to `main` function after certain preparations are done. You can see that your linkage will get failed if you don't provide `main` function. – Eugene Sh. Feb 11 '20 at 18:52
  • 5
    In a program's [header](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format): `AddressOfEntryPoint`: The address of the entry point relative to the image base when the executable file is loaded into memory. For program images, this is the starting address." This points to the startup code mentioned above. – Jongware Feb 11 '20 at 18:54
  • @EugeneSh. Where actually the runtime library resides and all programs even low level languages has that run time library – Nazim Feb 11 '20 at 18:54
  • 2
    What low level languages? We are talking about C here, in other languages there is no `main`, they have other mechanisms.The runtime library is a part of your compiler toolchain. For the most common case on the hosted environment it would be [`crt0.o`](https://en.wikipedia.org/wiki/Crt0) – Eugene Sh. Feb 11 '20 at 18:56
  • @usr2564301 The hedear is a form factor and It is defined for all platforms like linux or macos? – Nazim Feb 11 '20 at 18:57
  • @EugeneSh. yes ı mean c sorry for that and thanks for the other direction. – Nazim Feb 11 '20 at 18:58
  • 1
    Each to its own. That one I pointed to is for Windows, other platforms have their own system (ELF, for instance). Modern OSes *must* interpret such headers before it can allocate memory, load the program (and related libraries), relocate if necessary, and initialize static and uninitialized data, before it can finally start at `entryPoint`. – Jongware Feb 11 '20 at 19:03
  • @usr2564301 Thank you very much that was really clear and explanatory – Nazim Feb 11 '20 at 19:04
  • 2
    depending on the design the compiler and C (runtime or static) library are separate entities that are brought together just like the compiler, assembler and linker, granted for them to fit there is a system design to it. in the gnu world the bootstrap code that preceeds the call to main is part of the C library and there are and have been different C libraries to choose from for the gnu toolchain. carrying with them different bootstraps and linker scripts, and then target specific matters too. – old_timer Feb 11 '20 at 19:22
  • 2
    per the system level design which the conversation now includes the operating system and its loader and the file formats it supports, elf as an example but not often limited to. the elf or other file has to be built according to the operating systems rules you cant just take one file and carry it across operating systems it is not assumed to work in general. elf for example is a generic container format useful across different applications. – old_timer Feb 11 '20 at 19:24
  • @old_timer, actually there is no standard which specify how to do it? It is depends on the desing choice and you mean there is no one way acomplish this including in embedded systems. – Nazim Feb 11 '20 at 19:32
  • @old_timer, Thank you very much the points that you have mentioned are really detailed and reflects experience. – Nazim Feb 11 '20 at 19:39
  • 2
    the os and loader (and as a result binary and library) design may be that the loader takes care of the C assumptions/rules like .bss is zeroed and .data initialized and the stack pointer is pointing to the applications top of stack. or it can be that the bootstrap does all of these things (as well as a number of others not mentioned), or a combination of the two. – old_timer Feb 11 '20 at 19:43
  • 1
    yes very much design choice, the operating system for starters dictates what file formats it will support and then how that format is used. its not a case that the operating system must interpret the headers necessarily it is a case that the headers must conform to the operating systems rules/requirements for that format. read into that whatever you want the overall design to choose a file format and how to implement the fields/features are a combination of what the file format can support and how/if the os wants to use them and then both sides have to conform to the decision. – old_timer Feb 11 '20 at 19:45
  • 2
    you are already on the right path, you can use readelf -a and objdump -D to look at elf files for linux for example and see the _start entry point and the minimal code that is in there (At least on my linux on my machine) so the loader is doing the .bss and .data part I assume. but I work at this level in bootloaders so you do more of the work yourself because you conform to the processor logics rules not an operating systems rules... – old_timer Feb 11 '20 at 19:46
  • @old_timer Thank you very much sir, the things you have explained are like pearl for me and i have now make a research about the things you have mentioned. – Nazim Feb 11 '20 at 19:50
  • 2
    `ld` has an option `-e` to set the entry point. I imagine that the gcc passes `-e __start` and `crt0.o`, which has `__start` and that calls the external symbol `_main`. – Erik Eidt Feb 11 '20 at 20:01
  • 1
    the ld linker script language also has a way to specify the entry point – old_timer Feb 11 '20 at 20:49

1 Answers1

3

The application does not know that main() is the entry point. Firstly, we assume C not C++ here despite your picture.

For C the "C" entry point is main(). But you cant just start execution there as we have assumptions, more than that, rules, in C that for example .data needs to be initialized and .bss zeroed.

unsigned x = 1;
unsigned int y;

We expect that when main() is hit that x=1. and most folks assume and perhaps it is specified that y = 0 at that time, I wouldn't make that assumption, but anyway.

We also need a stack pointer and need to deal with argc/argv. If C++ then other stuff has to be done. Even for C depending.

The APPLICATION does not generally know any of this. You are likely working with a C library and that library is/should be responsible for bootstrap code that preceeds main() as well as a linker script for the linker as bootstrap and linker script are intimately related. And one could argue based on some implementations that the C library is separable from the toolchain as we know with gnu you can choose from different ones and those have different bootstraps and linker scripts. But I am sure there are many that are intimately related, there is also a relationship of the library and the operating system as so many C library calls end up in one or countless system calls.

You design an operating system, part of the design of the operating system assuming it supports runtime loadable applications is a file format that the operating systems loader supports, features that the operating system loader wants to support and how they overlap with the file format, not uncommon for the OS to define the file format, but with elf and others (not accidentally/independently created no doubt) you have opportunities for a new OS to use an existing container like elf. The OS design and its loader determines a lot of things, and the C library that mates up with all of that has to follow all of those rules, if integrated into the compiler then the compiler has to play along as well.

It is not the application that knows it is part of the system design and the application is simply a slave to all of that, when you compile on that platform for that platform all of these rules and relationships are in play, you are putting in a very small part of the puzzle, the rest is already in place, what file formats are supported, per format what information is required, what rules are required that the compiler/library solution must provide. The system design dictates if .data and .bss are zeroed by the loader or by the application and what I mean by that is by the bootstrap not the user's portion of the program, you cant bootstrap C in C because that C would need a bootstrap and if that bootstrap were in C that C would need a bootstrap and so on.

int main ( void )
{
    return 0;  
}

there are a lot of things going on in the background when you compile that program not just the few instructions that might be needed to implement that code.

compile that program on windows and Linux and mac and different versions of each with different compilers for each or C libraries, and different versions of each, etc. And what you should expect to see is perhaps even if the same target ISA, same computer even, some percentage of the combinations MIGHT choose the same few instructions for the function, what is wrapped around it is expected to be maybe similar but not the same. Would be no reason to be surprised if some of the implementations are very different from each other.

And this is all for full blown operating systems that load programs into ram and run them, for embedded things don't be surprised if the differences are even bigger. Within a full blown os you would expect to see an mmu and the application gets a perhaps zero based address space for .text, .data, .bss at a minimum so all the solutions might have a favorite place or favorite number of sections in the same order in the binary but the size of each may be specific to the implementation. The order/size might vary by C library version or compiler version, etc.

The magic is in the system design. and that is not magic, that is design. main() cannot be entered directly and still have various parts of the language still work like .data and .bss init, stack pointer can be solved before the entry but how and where .data and .bss are is application specific so cant be handled by a simple branch to main from the OS.

The linker for your toolchain can be told in various ways where the entry point is it could be assumed/dictated for that tool/target or a command line option or a linker script option, or some special symbol you put on a label or whatever the designers choose. main is assumed to be the C entry point, although that doesn't actually mean it is there might be some C code that precedes it but in general there is some amount of asm (cant bootstrap C with C) and then one or more steps to main().

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • The same logic is valid for embedded systems. For example arm cortex m4 MCU, assuming no rtos runs on it, how these operation occurs. My host is windows – Nazim Feb 11 '20 at 21:43
  • Every programmer should be taught that in situations like your example with y, the value of y is not known and cannot be assumed to be zero, at least in C. – Rick Henderson Nov 10 '21 at 21:35