4

I'm trying to understand the operation of linker and loader, and memory addresses(physical or virtual) regarding how a program is actually compiled and executed. I encountered two pieces of information and formed my own version of comprehension.

1st information:

W.5.1 SHARED OBJECTS In a typical system, a number of programs will be running. Each program relies on a number of functions, some of which will be standard C library functions, like printf(), malloc(), strcpy(), etc. and some are non-standard or user defined functions. If every program uses the standard C library, it means that each program would normally have a unique copy of this particular library present within it. Unfortunately, this result in wasted resources, degrade the efficiency and performance. **Since the C library is common, it is better to have each program reference the common, one instance of that library, instead of having each program contain a copy of the library. This is implemented during the linking process where some of the objects are linked during the link time whereas some done during the run time (deferred/dynamic linking). **

2nd information:

C Library

Main Articles: See C Library, Creating a C Library One thing up front: When you begin working on your kernel, you do not have a C library available. You have to provide everything yourself, except a few pieces provided by the compiler itself. You will also have to port an existing C library or write one yourself. The C library implements the standard C functions (i.e., the things declared in , , etc.) and provides them in binary form suitable for linking with user-space applications. In addition to standard C functions (as defined in the ISO standard), a C library might (and usually does) implement further functionality, which might or might not be defined by some standard. The standard C library says nothing about networking, for example. For Unix-like systems, the POSIX standard defines what is expected from a C library; other systems might differ fundamentally. It should be noted that, in order to implement its functionality, the C library must call kernel functions. So, for your own OS, you can of course take a ready-made C library and just recompile it for your OS - but that requires that you tell the library how to call your kernel functions, and your kernel to actually provide those functions. A more elaborate example is available in Library Calls or, you can use an existing C Library or create your own C Library.

The way I understood:

when a computer boots, it first doesn't have any access to C library and instead it must work with machine code. But with the help of boot code, it will eventually start loading the OS. In this example, I will assume a computer loading linux OS. Naturally a linux kernel will be loaded.

when a linux kernel is booted, this also means that standard C library(basic functions like printf for example) is also loaded on to low memory(portion of RAM assigned for kernel space). Assume that a user has made a simple code using printf() from standard C library. The user will compile this code and during this process, the linker will make a 'reference' for printf(), implying the position where printf() function is residing in low memory. When this code is executed, the loader will load this executable saved in HDD to high memory(portion of RAM assigned for user space). When the process confronts printf() function, it will branch to low memory address containing the start of printf() function.

Am i correct? If not, where am I wrong?

do_os
  • 127
  • 1
  • 6

2 Answers2

7

You are wrong.

1.) There is no need to put libc into kernel. It doesn't affect any low-level system or hardware dependent components.

2.) libc.so is ordinary dynamic library.

Now some more details:

When you launch your application, f.e. from bash console, bash forks and execs new process. What does it mean. Actually, this means that OS creates address space environment and loads .text .data .bss from ELF file, preserves virtual space for stack. You can see this mappings here:

sudo cat /proc/1118/maps 
00400000-00407000 r-xp 00000000 08:01 1845158                            /sbin/getty
00606000-00607000 r--p 00006000 08:01 1845158                            /sbin/getty
00607000-00608000 rw-p 00007000 08:01 1845158                            /sbin/getty
00608000-0060a000 rw-p 00000000 00:00 0 
00ff3000-01014000 rw-p 00000000 00:00 0                                  [heap]
...
7f728efd3000-7f728efd5000 rw-p 001bf000 08:01 466797                     /lib/x86_64-linux-gnu/libc-2.19.so
7f728efd5000-7f728efda000 rw-p 00000000 00:00 0 
7f728efda000-7f728effd000 r-xp 00000000 08:01 466799                     /lib/x86_64-linux-gnu/ld-2.19.so

7f728f1fe000-7f728f1ff000 rw-p 00000000 00:00 0 
7fffa122b000-7fffa124c000 rw-p 00000000 00:00 0                          [stack]
7fffa1293000-7fffa1295000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

But there are more. After loading thoose segments, Linux kernel will also load ld-linux.so into memory (you can see it in mappings). This stuff called dynamic linker, and actually ld-linux is responsible for all dynamic libraries loading. As you might know, at the moment the application have been compiled, you already know the list of shared libraries you will use. You can check it via ldd command

ldd /sbin/getty 
linux-vdso.so.1 =>  (0x00007fff4cfa6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2af2832000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2af2c24000)

This stuff must be held somewhere in the ELF (don't know where exactly). So after loading, ld-linux uses this list and finds all needed libraries at predefined (standart) paths like /usr/lib and so on. Now ld-linux can just mmap regions for located dynamic libraries. That is how libc will be loaded to process address space.

Alex Hoppus
  • 3,821
  • 4
  • 28
  • 47
  • 1
    thank you. I guess I'll need to study more to fully understand this. But there is still one thing I'd like to ask. I get that shared objects (.so files) can be shared among many processes through dynamic linking. So then does this mean that when each process 'mmap's new memory regions for shared library, it would result each process eventually using multiple sections of RAM for holding same copies of code(which would be shared library functions). Am I correct? – do_os Jul 25 '15 at 08:32
  • 1
    @do_os Linux and other UNIX variants optimize it. While multiple processes will map the library's code region to its virtual memory address space, the kernel only keeps one copy of the library's text segment in physical memory (even though it is mapped to different virtual addresses in different processes). Of course, the data segment cannot be shared, so each process gets its own. – Filipe Gonçalves Jul 25 '15 at 08:35
  • 1
    @do_os This happens not only with library text segments, but pretty much with anything that is mmaped read-only: as long as everyone maps something read-only, everyone can share the same underlying copy. – Filipe Gonçalves Jul 25 '15 at 08:36
  • 1
    @FilipeGonçalves ah... then I guess what I presumed in my original question seems to be partially right: kernel having the original copy of library ready in RAM which can be shared with other processes(for text segments). thanks you for your insight – do_os Jul 25 '15 at 08:53
  • 1
    @do_os actually every part of physical memory at the end managed by kernel page allocator. So if we turn question in that way - some of libc parts will be in kernel pagecache. New app when will access libc functionality will first look inside this pagecache, and see that libc already here. http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files/ – Alex Hoppus Jul 25 '15 at 09:03
  • @AlexHoppus thanks for info on page cache. I think this solves my question about on how and where libc is residing in RAM. – do_os Jul 25 '15 at 09:26
1

ah... then I guess what I presumed in my original question seems to be partially right: kernel having the original copy of library ready in RAM which can be shared with other processes(for text segments). thanks you for your insight

You are even more right than you thinking :) Look at this: linux-vdso.so.1 => (0x00007fff4cfa6000) this is almost a "standard C library(basic functions like printf for example) ... also loaded on to low memory". Well, not in low memory:) and non standard (in terms of C) and most of time used by C library instead of code directly, but yes: loaded by kernel into userspace as a set of functions standard in linux context. http://man7.org/linux/man-pages/man7/vdso.7.html