good explanation of __read_mostly, init, exit macros

Question

The macro expansion of __read_mostly :

#define __read_mostly __attribute__((__section__(".data..read_mostly"))

This one is from cache.h

__init:

#define __init          __section(.init.text) __cold notrace

from init.h

__exit:

#define __exit          __section(.exit.text) __exitused __cold notrace

After searching through net i have not found any good explanation of what is happening there.

Additonal question : I have heard about various "linker magic" employed in kernel development. Any information regarding this will be wonderful.

I have some ideas about these macros about what they do. Like __init supposed to indicate that the function code can be removed after initialization. __read_mostly is for indicating that the data is seldom written and by this it minimizes cache misses. But i have not idea about How they do it. I mean they are gcc extensions. So in theory they can be demonstrated by small userland c code.

UPDATE 1:

I have tried to test the __section__ with arbitrary section name. the test code :

#include <stdio.h>

#define __read_mostly __attribute__((__section__("MY_DATA")))

struct ro {
    char a;
    int b;
    char * c;
};

struct ro my_ro  __read_mostly = {
    .a = 'a',
    .b = 3,
    .c = NULL,
};


int main(int argc, char **argv) {
    printf("hello");
    printf("my ro %c %d %p \n", my_ro.a, my_ro.b, my_ro.c);
    return 0;
}

Now with __read_mostly the generated assembly code :

    .file   "ro.c"
.globl my_ro
    .section    MY_DATA,"aw",@progbits
    .align 16
    .type   my_ro, @object
    .size   my_ro, 16
my_ro:
    .byte   97
    .zero   3
    .long   3
    .quad   0
    .section    .rodata
.LC0:
    .string "hello"
.LC1:
    .string "my ro %c %d %p \n"
    .text
.globl main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    pushq   %rbx
    subq    $24, %rsp
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $.LC0, %eax
    movq    %rax, %rdi
    movl    $0, %eax
    .cfi_offset 3, -24
    call    printf
    movq    my_ro+8(%rip), %rcx
    movl    my_ro+4(%rip), %edx
    movzbl  my_ro(%rip), %eax
    movsbl  %al, %ebx
    movl    $.LC1, %eax
    movl    %ebx, %esi
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    movl    $0, %eax
    addq    $24, %rsp
    popq    %rbx
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (GNU) 4.4.6 20110731 (Red Hat 4.4.6-3)"
    .section    .note.GNU-stack,"",@progbits

Now without the __read_mostly macro the assembly code remains more or less the same.

this is the diff

--- rm.S    2012-07-17 16:17:05.795771270 +0600
+++ rw.S    2012-07-17 16:19:08.633895693 +0600
@@ -1,6 +1,6 @@
    .file   "ro.c"
 .globl my_ro
-   .section    MY_DATA,"aw",@progbits
+   .data
    .align 16
    .type   my_ro, @object
    .size   my_ro, 16

So essentially only the a subsection is created, nothing fancy.

Even the objdump disassmbly does not show any difference.

So my final conclusion about them, its the linker's job do something for data section marked with a special name. I think linux kernel uses some kind of custom linker script do achieve these things.

One of the thing about __read_mostly, data which were put there can be grouped and managed in a way so that cache misses can be reduced.

Someone at lkml submitted a patch to remove __read_mostly. Which spawned a fascinated discussion on the merits and demerits of __read_mostly.

here is the link : https://lkml.org/lkml/2007/12/13/477

I will post further update on __init and __exit.

UPDATE 2

These macros __init , __exit and __read_mostly put the contents of data(in case of __read_mostly) and text(in cases of __init and __exit) are put into custom named sections. These sections are utilized by the linker. Now as linker is not used as its default behaviour for various reasons, A linker script is employed to achieve the purposes of these macros.

A background may be found how a custom linker script can be used to eliminate dead code(code which is linked to by linker but never executed). This issue is of very high importance in embedded scenarios. This document discusses how a linker script can be fine tuned to remove dead code : elinux.org/images/2/2d/ELC2010-gc-sections_Denys_Vlasenko.pdf

In case kernel the initial linker script can be found include/asm-generic/vmlinux.lds.h. This is not the final script. This is kind of starting point, the linker script is further modified for different platforms.

A quick look at this file the portions of interest can immediately found:

#define READ_MOSTLY_DATA(align)                     \
    . = ALIGN(align);                       \
    *(.data..read_mostly)                       \
    . = ALIGN(align);

It seems this section is using the ".data..readmostly" section.

Also you can find __init and __exit section related linker commands :

#define INIT_TEXT                           \
    *(.init.text)                           \
    DEV_DISCARD(init.text)                      \
    CPU_DISCARD(init.text)                      \
    MEM_DISCARD(init.text)

#define EXIT_TEXT                           \
    *(.exit.text)                           \
    DEV_DISCARD(exit.text)                      \
    CPU_DISCARD(exit.text)                      \
    MEM_DISCARD(exit.text)

Linking seems pretty complex thing to do :)

any explanation from the guy who down-voted me would be nice. I can learn from my mistakes and apply this knowledge in my future questions. — Aftnix, Jul 16 '12 at 13:53
You are right: as far as `.init*` sections in the kernel modules are concerned, the module loader frees the memory they occupy after the module has completed its initialization. The loader also removes the entries for the symbols in these sections from the symbol table at that stage. — Eugene, Jul 17 '12 at 09:34
(continued) When processing a kernel module, the loader checks the names of the sections to determine how to process them. See, for example the code of [load_module()](http://lxr.free-electrons.com/source/kernel/module.c?v=3.4#L2865), etc., in the kernel sources. — Eugene, Jul 17 '12 at 09:40

ecatmur · Accepted Answer · 2012-07-16T15:29:00.697

GCC attributes are a general mechanism to give instructions to the compiler that are outside the specification of the language itself.

The common facility that the macros you list is the use of the __section__ attribute which is described as:

The section attribute specifies that a function lives in a particular section. For example, the declaration:
extern void foobar (void) __attribute__ ((section ("bar")));
puts the function foobar in the bar section.

So what does it mean to put something in a section? An object file is divided into sections: .text for executable machine code, .data for read-write data, .rodata for read-only data, .bss for data initialised to zero, etc. The names and purposes of these sections is a matter of platform convention, and some special sections can only be accessed from C using the __attribute__ ((section)) syntax.

In your example you can guess that .data..read_mostly is a subsection of .data for data that will be mostly read; .init.text is a text (machine code) section that will be run when the program is initialised, etc.

On Linux, deciding what to do with the various sections is the job of the kernel; when userspace requests to exec a program, it will read the program image section-by-section and process them appropriately: .data sections get mapped as read-write pages, .rodata as read-only, .text as execute-only, etc. Presumably .init.text will be executed before the program starts; that could either be done by the kernel or by userspace code placed at the program's entry point (I'm guessing the latter).

If you want to see the effect of these attributes, a good test is to run gcc with the -S option to output assembler code, which will contain the section directives. You could then run the assembler with and without the section directives and use objdump or even hex dump the resulting object file to see how it differs.

I'm going to do some experimentation with the attributes listed here and will post subsequent update. +1 for this nice organized answer. — Aftnix, Jul 16 '12 at 15:28

artless noise · Answer 2 · 2019-04-25T16:12:40.063

As far as I know, these macros are used exclusively by the kernel. In theory, they could apply to user-space, but I don't believe this is the case. They all group similar variable and code together for different effects.

init/exit

A lot of code is needed to setup the kernel; this happens before any user space is running at all. Ie, before the init task runs. In many cases, this code is never used again. So it would be a waste to consume un-swappable RAM after boot. The familiar kernel message Freeing init memory is a result of the init section. Some drivers maybe configured as modules. In these cases, they exit. However, if they are compiled into the kernel, the don't necessarily exit (they may shutdown). This is another section to group this type of code/data.

cold/hot

Each cache line has a fixed sized. You can maximize a cache by putting the same type of data/function in it. The idea is that often used code can go side by side. If the cache is four instructions, the end of one hot routine should merge with the beginning of the next hot routine. Similarly, it is good to keep seldom used code together, as we hope it never goes in the cache.

read_mostly

The idea here is similar to hot; the difference with data we can update the values. When this is done, the entire cache line becomes dirty and must be re-written to main RAM. This is needed for multi-CPU consistency and when that cache line goes stale. If nothing has changed in the difference between the CPU cache version and main memory, then nothing needs to happen on an eviction. This optimizes the RAM bus so that other important things can happen.

These items are strictly for the kernel. Similar tricks could (are?) be implemented for user space. That would depend on the loader in use; which is often different depending on the libc in use.

Some of these ideas can/are applied to user space. For instance, function maybe in separate files, yet call each other. With *LTO* (load time optimization), the function maybe placed beside each other. With user space you have L1/L2 cache, but also paging of text which takes place in 4k chunks. People have used this idea to optimize loading of Qt applications. Linux is lazy in loading code and will fetch from (NAND) disk as the code executes. Grouping early Qt init code together will minimize the 4k page loads. Hot and cold are attributes supported for userspace (`.text.hot`...) — artless noise, Oct 27 '15 at 16:04

good explanation of __read_mostly, init, exit macros

2 Answers2

init/exit

cold/hot

read_mostly

Linked

good explanation of __read_mostly, __init, __exit macros

2 Answers2

init/exit

cold/hot

read_mostly

Linked

good explanation of __read_mostly, init, exit macros