7

When you mark a function as inline, you hint the compiler that this function is a candidate for inlining. The compiler can still decide that it's not a good idea, and ignore it.

  1. Is there a way to see if the function gets inlined or not, without using the disassembler? Is there some compiler warning that I don't know about maybe?

  2. What are the rules for inlining that the compiler uses? Are there constructs that cause a function to never get inlined for example?

Wouter van Nifterick
  • 23,603
  • 7
  • 78
  • 122

2 Answers2

8

The compiler emits a hint if it can't inline your function. The documentation explains the rules for what can and cannot be inlined.

As for the discretionary decisions that the compiler takes as to whether or not to inline (as opposed to whether or not inlining is possible), they are not documented and can be considered an implementation detail.

I recall that you recently commented on one of my answers to a different question that a particular function was 10 times faster once inlined. Clearly you are interested in inlining but in that particular case I cannot believe such an enormous gain for a function with so many floating point operations. I suspect that inlining is not actually giving you the performance improvements that you think it does.

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Yep, inlining is not the panacea many people think it is. We have profiled various types of methods with various results/parameters types; with various inline directives; and playing with local en non-local functions and procedures. We have seen both performance improvements and **degredations** on inlined methods. Unfortunately the figures didn't allow any clear do's and don't's, except that your best bet is to set inline to auto and let the compiler do its work, and only use other inline directives when profiling shows the compiler got it wrong. – Marjan Venema Feb 17 '11 at 17:26
  • The other thing that's nasty about inlining is what it does to debugging. – David Heffernan Feb 17 '11 at 17:27
  • The 10x factor was actually measured with a test program. Try it yourself if you still cannot believe it: http://www.xs4all.nl/~niff/stuff/inlinetest.dpr . (on my I7 at home it was 10x, and admittedly, on my P4 at work it's about 7x). Still, if it's the inner loop of something that gets called often, such a performance gain should be well worth considering. – Wouter van Nifterick Feb 17 '11 at 17:38
  • @Wouter I changed your code to use random values as opposed to the same hard coded constants of 0 each time around the loop and the result is that the inline performance is the same as not inlining. I assigned the result of a call to `Random` to 6 local variables which were then passed to the function. I think in any plausible real-world scenario inlining will not help this particular function. – David Heffernan Feb 17 '11 at 18:11
  • 2
    @Wouter: Passing zeros as parameters in this case makes the optimizer simplify the expression to basically "abs(0) < tolerance" which runs faster than the original code. Also when timing functions without side-effects it is important to use the return value (by assigning to a global variable) otherwise the code might be optimized away completely. – Ville Krumlinde Feb 17 '11 at 18:47
  • Bottom line is this: it's 2011; calling functions is no longer expensive; stop worrying about inlining things, it makes precious little difference – David Heffernan Feb 17 '11 at 18:49
  • @Marjan: if inlining makes it slower there are probably a lot of variables. Consider a non-inlined function that can use registers for all the variables that are used in a loop. That's fast. If it's inlined there are more variables (the ones that are in the 'parent' routine). The loop variables may be moved to memory and it gets slower.... – Giel Feb 17 '11 at 19:21
  • @Giel @Marjan @Wouter @Ville I'll inline something like `function IsZero: Boolean` which calls an overloaded `=` operator. That's as complex as I ever contemplate for an inline. – David Heffernan Feb 17 '11 at 19:27
  • I just did the same, and the inlined version is still consistently faster, although the effect is considerably smaller indeed (2756 vs 2280 ms). But although I didn't intend this behavior initially, I found out that something even cooler is going on: In the version with the *hard coded* parameters, the compiler solves the constant part of the equation in the function, and only makes that absolute and compares it with tolerance, which I find pretty neat. That doesn't happen with the non-inlined version. – Wouter van Nifterick Feb 17 '11 at 19:36
  • @Wouter It is pretty cool, but sadly not very helpful! For your times, are you getting new values for the parameters each time around the loop? You really ought to if you want to simulate realistic use of this function. If I assign random values outside the for loop then inline is faster. If I assign inside the loop then they are the same. There's not much point calculating the same thing more than once! – David Heffernan Feb 17 '11 at 19:38
  • @Giel: That may have been a factor, but the methods we used were contrived and therefore particularly simple because we were mostly testing some other hacks (reusing a local var instead of declaring two, and using different result types). So the number of vars really can't have played that much of a role, as the total number of vars and parameters will have been five (if that) or less... – Marjan Venema Feb 17 '11 at 19:43
  • @David: you are selling yourself short not using inlining for more ;-) That said, you really need to be comfortable with assembler code and really inspect what the compiler generates in order to assess the effect on performance. The compiler does a good job most of the time, but takes wrong turns as well. Luckily I have a colleague whose middlename seems to be "Assembler" (or "Register" for that matter)... – Marjan Venema Feb 17 '11 at 19:47
  • @Marjan My code spends most of its time in double precision vector and matrix algorithms that I wrote using inline assembly. They can't be inlined. Inlining the rest wouldn't help because the rest is not the bottleneck. – David Heffernan Feb 17 '11 at 19:50
  • @David: Inlining making precious little difference? We can hardly survive without it. But then, I guess our apps, and especially our server are not exactly your run-of-the-mill types of applications. Our server could be described as a in-memory database server (though that is doing it a great injustice) often loading models of over 120 GB into memory. Try doing that and being responsive in serving information to your connected clients without inlining... It ain't gonna happen... – Marjan Venema Feb 17 '11 at 19:56
  • @Marjan Our code is all 8087 FPU stuff. Most of the time consuming functions have tens or hundreds of FLOPS. Inlining is not going to help. – David Heffernan Feb 17 '11 at 19:58
  • @Marjan It's not very advanced inline assembly. Mostly the same as what dcc32 produces but without prolog/epilog stack treatment and only one FWAIT per function! – David Heffernan Feb 17 '11 at 19:59
  • I've made some major performance improvements with `inline` just by inlining functions (most noticeable in map drawing and routeplanner routines). Agreed, it's only useful in rare cases, but we don't need to spread fud about it. Before we know it becomes the new Delphi dogma, after the `with` statement. – Wouter van Nifterick Feb 17 '11 at 20:08
3

You can look at the blue dots in the gutter after building the project. If there are blue dots next to a function it hasn't been inlined at least once.

I don't think you can rely on hints emitted by the compiler. It tells you when it's not inlined because the file the function lives in isn't in the interface uses clause. If it's because of other reasons it typically doesn't tell you.

Giel
  • 2,066
  • 20
  • 22