Overhead of calling tiny functions from a tight inner loop? [C++]

Question

Say you see a loop like this one:

for(int i=0;
    i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
    ++i)
{
  thing.getData().insert(
    thing.GetData().Count(),
    thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
    );
}

if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile. This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following:

Is having a messy condition in the for loop going to get evaluated each time, or only once?
If those get methods are simply returning references to member variables on the objects, will they be inlined away?
Would you expect custom [] operators to get optimized at all?

In other words is it worth the time (in performance only, not readability) to convert it to something like:

ElementContainer &source = 
   thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
  destination.insert(thing.GetData().Count(), source[i].getName());
}

Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?

Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.

I think as it is, it is a bit of code smell anyway ("message chain"), and in any case it might make sense to do something about it. — UncleBens, Mar 31 '10 at 14:42
Saying this is a "general question" doesn't change the fact that this is yet another attempt to optimize without profiling. An exercise in futile guesswork in other words. — Edward Strange, Mar 31 '10 at 16:05
Sorry, you're wrong. If you know what you're doing you should be able to tell me if this code can sensibly be optimized, regardless how much time is spent in it. — Mr. Boy, Mar 31 '10 at 16:26
The compiler might hoist invariants out of the loop for you, but that only (maybe) helps performance. The second example is easier to read because you give meaningful names to the long expressions. A compiler will never make the code more readable. — Adrian McCarthy, Apr 16 '10 at 16:11

score 4 · Answer 1 · answered Mar 31 '10 at 13:04

The general way to answer such questions is to looked at the produced assembly. With gcc, this involve replacing the -c flag with -S.

My own rule is not to fight the compiler. If something is to be inlined, then I make sure that the compiler has all the information needed to perform such an inline, and (possibly) I try to urge him to do so with an explicit inline keyword.

Also, inlining saves a few opcodes but makes the code grow, which, as far as L1 cache is concerned, can be very bad for performance.

score 2 · Answer 2 · 2010-03-31T13:32:44.623

All the questions you are asking are compiler-specific, so the only sensible answer is "it depends". If it is important to you, you should (as always) look at the code the compiler is emitting and do some timing experiments. Make sure your code is compiled with all optimisations turned on - this can make a big difference for things like operator[](), which is often implemented as an inline function, but which won't be inlined (in GCC at least) unless you turn on optimisation.

score 1 · Answer 3 · answered Mar 31 '10 at 13:04

If the loop is that critical, I can only suggest that you look at the code generated. If the compiler is allowed to aggressively optimise the calls away then perhaps it will not be an issue. Sorry to say this but modern compilers can optimise incredibly well and the I really would suggest profiling to find the best solution in your particular case.

score 1 · Answer 4 · answered Mar 31 '10 at 13:05

1

If the methods are small and can and will be inlined, then the compiler may do the same optimizations that you have done. So, look at the generated code and compare.

Edit: It is also important to mark const methods as const, e.g. in your example count() and getName() should be const to let the compiler know that these methods do not alter the contents of the given object.

answered Mar 31 '10 at 13:05

Frunsi

7,099
5
36
42

The note about const is not correct. const makes no guarantee about this. const might contain mutables, or it might be "cast away". – Suma Mar 31 '10 at 13:33
@Suma: yes true, but at least it is a hint for the compiler. Maybe in practice the compiler ignores const for optimizations.. – Frunsi Mar 31 '10 at 18:08

score 1 · Answer 5 · answered Mar 31 '10 at 14:12

1

As a rule, you should not have all that garbage in your "for condition" unless the result is going to be changing during your loop execution.

Use another variable set outside the loop. This will eliminate the WTF when reading the code, it will not negatively impact performance, and it will sidestep the question of how well the functions get optimized. If those calls are not optimized this will also result in performance increase.

answered Mar 31 '10 at 14:12

phkahler

5,687
1
23
31

++ My sentiments exactly. I would tend to do that even before I know it's actually a performance problem. BTW your self-description sounds really interesting. – Mike Dunlavey Mar 31 '10 at 19:00

score 0 · Answer 6 · answered Mar 31 '10 at 13:47

I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.

This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.

For what it's worth, if speed really mattered, I'd split it up in Java too.

Polymorphic inline caching of any self-respecting JIT will optimize this example easily. While in principle static C++ compilers could do PICs with whole program optimization and profile feedback, I'm not aware of any production compilers that would do it. — Ants Aasma, Apr 01 '10 at 15:39

Overhead of calling tiny functions from a tight inner loop? [C++]

6 Answers6