1

I tried to solve the problem that "kallsyms_lookup_name is not exported anymore in kernels > 5.7 ", and found a solution at: https://github.com/xcellerator/linux_kernel_hacking/issues/3.

It says that "the kernel functions are all aligned so that the final nibble is 0x0", and I wonder why?

HnlyWk
  • 23
  • 5

2 Answers2

1

Most importantly, this is not true generically. Some architectures will have other alignment criteria. Why is often a really difficult question to answer. It is completely possible for Linux to not do this. It could even change with kernel versions for the same architechure.


The low nibble as zero is 16byte alignment. It is enforced by the linker and the compiler (via CPU and/or ABI restrictions/efficiencies). Generally, you want a function to start on a cache line. This is so functions do not overlap with each other in the cache. It is also easier to fill. Ie, the expectation is that you will only have one partial at the end of the function. It is possible that branches, case labels and other constructs could also be aligned depending on the CPU.

Even without cache, primary SDRAM fills in batches. 16bytes seems like a reasonable amount to minimize the overhead of alignment (wasted bytes) versus efficiency. Less SDRAM cycles. Of course the SDRAM burst and cache lines are of the same order as they work together to get code to the CPUs decode units.

There can be other reasons for alignment such as hardware and internal tables that only use a sub-set of address bits. This alignment can be for external functions only. Some instructions will operate faster (or only be possible) on aligned data. So, some kernel function instrumentation may also benefit from the alignment (either through more compact tables or adding veneers, etc).

See: x86_64 stack alignment - where many of the rationales for the stack alignment can apply to code as the kernel can on occasion treat code as data.

artless noise
  • 21,212
  • 6
  • 68
  • 105
  • *"This is 16byte alignment. It is enforced by the linker and the compiler (via CPU and ABI restrictions)"* - no it isn't? Are you sure? I don't think there is any CPU or ABI restriction on alignment of ***functions***. Stack/data alignment is a different story. – Marco Bonelli Jun 24 '22 at 13:42
  • Yes, I guess restriction is too aggressive. I meant performance benefit as well, updated. Also, what architecture are you talking about? I will high light that I don't think this is generally true (as per my last paragraph, move to start for emphasis). Thanks. As for why the reference to the stack (which may have mis-guided you), they can be related if you use some feature that mixes the two. It is probably more common in user space with tricks to implement veneers before a function executes. Try `find . -name '*lds*' -exec grep ALIGN {} \; -print` in the source to see ARCH variations. – artless noise Jun 24 '22 at 13:48
  • I don't think looking at `ALIGN` directives inside `.lds` files makes much sense. Linker scripts specify alignment of sections, not code inside them. - Anyway, on both x86 and ARM64 there is no "requirement" by CPU or ABI on function alignment, that's what I was talking about. It's definitely a compiler optimization to align functions. – Marco Bonelli Jun 24 '22 at 14:08
  • My point on the alignment scripts is it is different per architecture (`grep -l` would have been better). So, alignment of functions for performance can also be different per architecture. There are instruction alignment requirements for CPU instructions. If something is using self-modifying code, then it could matter. For instance, kprobes does make modifications to routines. Anyways, I think we have proved that **why** is a tough question. – artless noise Jun 24 '22 at 20:18
  • So setting 16byte alignment is for performance benefit(cache line and SDRAM) and it's not always be 16bytes, especially for different architecture. Leant a lot! Thanks a lot, Artless Noise! – HnlyWk Jun 25 '22 at 03:19
1

In general: no, kernel functions aren't all and always aligned to 16 bytes. Saying "the kernel functions are all aligned so that the final nibble is 0x0" is wrong. However, in the most common case, which is Linux x86-64 compiled with GCC with default kernel compiler flags, this happens to be true. Take some other case, like for example default config for ARM64, and you'll see that this does not hold.

The kernel itself does not specify any alignment for functions, but the compiler optimizations that it enables can (and will) align functions.

Alignment of functions is in fact a compiler optimization that on GCC is enabled using -falign-functions=. According to the GCC doc, compiling with at least -O2 (selected by CC_OPTIMIZE_FOR_PERFORMANCE=y, which is the default) will enable this optimization without an explicit value set. This means that the actual alignment value is chosen by GCC based on the architecture. On x86, the default is 16 bytes for the "generic" machine type (-march=x86-64, see doc).

Clang also supports -falign-functions= since version 6.0.1 according to their repository (it was previously ignored), though I am not sure if it is enabled or not at different optimization levels.


Why is this an optimization? Well, alignment can offer performance advantages. In theory, cache line alignment would be "optimal" for cache performance, but there are other factors to consider: aligning to 64 bytes (cache line size on x86) would probably waste a lot of space for no good reason without improving performance that much. See How much does function alignment actually matter on modern processors?

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • It's an excellent answer! I tried to find some clues in the kernel code last night but it's too hard to find something. I suspected it might be caused by -flign-function of GCC, but I didn't grep a direct proof. Now I know It's set by the option CC_OPTIMIZE_FOR_PERFORMANC(-O2) finally. Thank you for the professional and patient anwser! – HnlyWk Jun 25 '22 at 03:04
  • Doesn’t architectural ABI require an alignment for data/instruction? It might be also a requirement based on how `call` and `jmp` instructions want. No) – 0andriy Jun 27 '22 at 21:48
  • @0andriy it may very well be on some architectures, I'm not saying that there isn't any such requirement. It's just not the reason for the 0x10 alignment. – Marco Bonelli Jun 28 '22 at 09:04