Is there an actual example where inline is detrimental to the performance of a C program?

Question

In many debates about the inline keyword in function declarations, someone will point that it can actually make your program slower in some cases – mostly due to code explosion, if I am correct. I have never met such an example in practice myself. What is an actual code where the use of inline can be expected to be detrimental to the performance?

Note that `inline` is just a suggestion; the compiler is free to ignore it. — dan04, Jun 27 '14 at 13:09
Have many compilation units all include the same big inline method and use it only once, then you might have a degenerate example... Still, depends on not doing link-time-code-generation or such things. — Deduplicator, Jun 27 '14 at 13:20
As @dan04 said, it is not required for the compiler to listen to you, and generally it is pretty smart. — rjp, Jun 27 '14 at 13:27
A lot of places, for example max function ('int max(int a, int b)') could just be inline (also can be defined as macro actually and will have the same effect). The program won't 'JUMP' to that function in memory and won't load the stack with the variables. TONS of uses. — Zach P, Jun 27 '14 at 14:08
Of course inline can be detrimental - if not, the compiler (or developer) would just always inline everything, all the time. It increases register pressure and code size / L1i cache footprint, either of which can easily be bigger performance issues than function call overhead. TANSTAAFL. — Jonathan Dursi, Jun 27 '14 at 15:49

score 20 · Answer 1 · answered Jun 27 '14 at 13:38

Exactly 10 years and one day ago I did this commit in OpenBSD:

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/amd64/include/intr.h.diff?r1=1.3;r2=1.4

The commit message was:

deinline splraise, spllower and setsoftint. Makes the kernel smaller and faster. deraadt@ ok

As far as I remember the kernel binary shrunk by more than 100kB and not a single test case could be produced that became slower and several macro benchmarks (like compiling the kernel) were measurably faster (5-10% if I recall correctly, but don't quote me on that).

Around the same time I went on a quest to actually measure inline functions in the OpenBSD kernel. I found a few that had minimal performance gains, but the majority had 0 measurable impact and several were making things much slower and were killed. At least one more uninlining had a huge impact and that one was the internal malloc macros (where the idea was to inline malloc if it had a size known at compile time) and packet buffer allocators that shrunk the kernel by 150kB and had a significant performance improvement.

One could speculate, although I have no proof, that this is because the kernel is large and we're struggling to stay inside the cache when executing system calls and every little bit helps. So what actually helped in those cases was just the shrinking of the binary, not the number of instructions executed.

Similar changes in removing inline was done years ago in the linux kernel as well. — hlovdal, Jun 27 '14 at 14:01

score 2 · Answer 2 · answered Aug 21 '15 at 21:19

Imagine a function that have no parameters, but intensive computation with a consistent number of intermediate values or register usage. Then Inline that function in code having a consistent number of intermediate values or register usage too.

Having no parameters make the call procedure more lightweight because no stack operations, that are time consuming, are required.

When inlined the compiler have to save many registers, and spill other to be used with the new function, reproducing the process of registers and data backup required for a function call possibly in worst way.

If the backup operations are more expansive, in terms of time and machine cycles, compared with the mechanism of function call, especially if the function is extensively called, then you have a detrimental effect.

This seems to be the case of some specific functions largely used in an OS.

Is there an actual example where inline is detrimental to the performance of a C program?

2 Answers2