Pointer dereferencing overhead vs branching / conditional statements

Question

In heavy loops, such as ones found in game applications, there could be many factors that decide what part of the loop body is executed (for example, a character object will be updated differently depending on its current state) and so instead of doing:

void my_loop_function(int dt) {
  if (conditionX && conditionY) 
    doFoo();
  else
    doBar();

  ...
}

I am used to using a function pointer that points to a certain logic function corresponding to the character's current state, as in:

void (*updater)(int);

void something_happens() {
  updater = &doFoo;
}
void something_else_happens() {
  updater = &doBar;
}
void my_loop_function(int dt) {
  (*updater)(dt);

  ...
}

And in the case where I don't want to do anything, I define a dummy function and point to it when I need to:

void do_nothing(int dt) { }

Now what I'm really wondering is: am I obsessing about this needlessly? The example given above of course is simple; sometimes I need to check many variables to figure out which pieces of code I'll need to execute, and so I figured out using these "state" function pointers would indeed be more optimal, and to me, natural, but a few people I'm dealing with are heavily disagreeing.

So, is the gain from using a (virtual)function pointer worth it instead of filling my loops with conditional statements to flow the logic?

Edit: to clarify how the pointer is being set, it's done through event handling on a per-object basis. When an event occurs and, say, that character has custom logic attached to it, it sets the updater pointer in that event handler until another event occurs which will change the flow once again.

Thank you

Have you experienced performance issues yet ? My feeling here is that you are wondering about **premature optimization**. You should write your logic in a maintainable way. If later on you experience performance issues, it will still be time to profile your code and optimize it. — ereOn, Sep 19 '11 at 06:56
The beautiful thing about our profession is that you don't have to believe in a solution, and there is no room for subjectivity. You can actually *measure* the performance. Either write an actual benchmark program (I'm too lazy right now) or test it inside your application. — bitmask, Sep 19 '11 at 07:18
@Alf: `write clear code` <-- !?! --> `measure ` **`b4↯`** `optimize` — sehe, Sep 19 '11 at 07:37
I'd be carefull jumping on the "premature optimization" bandwagon in this case: imo the code using function pointers is clearer and, above all, much more versatile than the one with the conditions. These are greater benefits than the speed. — stijn, Sep 19 '11 at 07:40

score 6 · Accepted Answer · answered Sep 19 '11 at 06:59

6

The function pointer approach let's you make the transitions asynchronous. Rather than just passing dt to the updater, pass the object as well. Now the updater can itself be responsible for the state transitions. This localizes the state transition logic instead of globalizing it in one big ugly if ... else if ... else if ... function.

As far as the cost of this indirection, do you care? You might care if your updaters are so extremely small that the cost of a dereference plus a function call overwhelms the cost of executing the updater code. If the updaters are of any complexity, that complexity is going to overwhelm the cost of this added flexibility.

answered Sep 19 '11 at 06:59

David Hammen

32,454
9
60
108

If I haven't mistaken your answer, in my code this is all handled per object, and the state transition is done through events. When a certain object is interested in an event, and it occurs, it modifies the updater accordingly. I wrote the code above in C-style for clarity, nothing else. – amireh Sep 19 '11 at 07:35
You did misunderstand my answer to some extent. I suggested putting the event handling, at least insofar as how your updaters behave / which updaters in the updaters themselves. Not necessarily the best solution. Separating events from updates is a very good idea. You will of course need some kind of event engine to do this. The polymorphism makes this a very doable task. – David Hammen Sep 19 '11 at 18:14

score 4 · Answer 2 · answered Sep 19 '11 at 06:50

4

I think I 'll agree with the non-believers here. The money question in this case is how is the pointer value going to be set?

If you can somehow index into a map and produce a pointer, then this approach might justify itself through reducing code complexity. However, what you have here is rather more like a state machine spread across several functions.

Consider that something_else_happens in practice will have to examine the previous value of the pointer before setting it to another value. The same goes for something_different_happens, etc. In effect you 've scattered the logic for your state machine all over the place and made it difficult to follow.

answered Sep 19 '11 at 06:50

Jon

428,835
81
738
806

In my case, and that's my fault for not clarifying, this is all done on objects which have all the state required for the event handlers to figure out what to do next (and that's where the pointer gets set.) I'll update my question accordingly. – amireh Sep 19 '11 at 07:38
although i agree that scattering the logic of the state machine all over the place may make it difficult to follow, having a 5000 lines long if/else statement containing the state machine makes it a nightmare to maintain. clearly, the question here is not about performance but about readability/maintainability. – Adrien Plisson Sep 19 '11 at 07:42

score 2 · Answer 3 · answered Sep 19 '11 at 07:24

Now what I'm really wondering is: am I obsessing about this needlessly?

If you haven't actually run your code, and found that it actually runs too slowly, then yes, I think you probably are worrying about performance too soon.

Herb Sutter and Andrei Alexandrescu in C++ Coding Standards: 101 Rules, Guidelines, and Best Practices devote chapter 8 to this, called "Don’t optimize prematurely", and they summarise it well:

Spur not a willing horse (Latin proverb): Premature optimization is as addictive as it is unproductive. The first rule of optimization is: Don’t do it. The second rule of optimization (for experts only) is: Don’t do it yet. Measure twice, optimize once.

It's also worth reading chapter 9: "Don’t pessimize prematurely"

Thank you for the book link, I haven't read that one, and I will! Regarding optimizing prematurely, it's just that this style made it clearer for me: state transitioning upon events instead of flatly checking against flags, and so my reasoning was half performance and half design based. — amireh, Sep 19 '11 at 07:37
OK. I took your reference to 'heavy loops' to mean you were worried about speed. Yes, code clarity is very important. — Clare Macrae, Sep 19 '11 at 08:53

score 0 · Answer 4 · answered Sep 19 '11 at 07:55

0

Testing a condition is:

fetch a value
compare (subtract)
Jump if zero (or non-zero)

Perform an indirection is:

Fetch an address
jump.

It may be even more performant!

In fact you do the "compare" before, in another place, to decide what to call. The result will be identical. You did nothign more that an dispatch system identical to the one the compiler does when calling virtual functions. It is proven that avoiding virtual function to implement dispatching through switches doesn't improve performance on modern compilers.

The "don't use indirection / don't use virtual / don't use function pointer / don't dynamic cast etc." in most of the case are just myths based on historical limitations of early compiler and hardware architectures..

answered Sep 19 '11 at 07:55

Emilio Garavaglia

20,229
2
46
63

1

This logic is assuming a non-pipelined CPU, i.e. one from a previous century. In todays pipelined CPU's, you get a nasty stall when the jump _location_ depends on the previous fetch. In the first case, the jump location is fixed, in the second it depends on the fetched value. – MSalters Sep 19 '11 at 08:04
@MSalters It mostly depend on where the jump locations are ... located. Modern CPU can workaround the problem you refer, in the most of "normal" cases. Not something to take care of if everything is related to the jump is fitted in a same execution module (e.g. not distributed in may DLLs) and reached ba a same single thread. – Emilio Garavaglia Sep 19 '11 at 08:27
@MSalters this very much depends on the CPU. Some do stall, noticeably, when faced with an indirect jump. Others do a better job of it. Until you've actually benchmarked the two cases, you can't really make any statement one way or the other. (This obviously applies just as well to Emilio Garavaglia.) – James Kanze Sep 19 '11 at 08:33
1

@Emilio: A CPU has no concepts of DLLs. All jumps are within a single thread, by definition. A jump in effect says: the current thread continues with _that_ instruction. As for the "workaround", please be more specific. In the first case, the jump address depends on the instruction stream only; in the second case it needs a physical fetch. The latter is fundamentally more complex (absolutely needs a memory access). Of course, fast CPU's will do their best to minimize the costs, but they do that for all common ops, and fundamentally complex ops will remain slowest. – MSalters Sep 19 '11 at 08:38
1

You forget that indirection prevents inlining so it will be: fetch address, save current position, jump, set up stack frame for function, do work, restore stack. – adrianm Sep 19 '11 at 09:23
@MSalters: About DLL etc. The problem is how "far" a jump is. The fetch of the address to jump to is in any case required. In a case is immediatelly after the opcode, in the other case is where a variable resides (and can be a register, o another page mapped in the internal cache) Since all the set of addresses will never change (the possible values of all those jumps are finite, once the program had been compiled) they will be probably cached and never reloaded. The access time of a cahced value is not that different from the one of a register. – Emilio Garavaglia Sep 19 '11 at 12:16
@adrianm: Indirection does not necessarily prevent inlining, if the number of possible destination can fit a cache page. Now, since the destinations here are a fixed set (not the result of an computation), ma be no difference can be observed – Emilio Garavaglia Sep 19 '11 at 12:19

score 0 · Answer 5 · answered Sep 19 '11 at 08:30

The performance difference will depend on the hardware and the compiler optimizer. Indirect calls can be very expensive on some machines, and very cheap on others. And really good compilers may be able to optimize even indirect calls, based on profiler output. Until you've actually benchmarked both variants, on your actual target hardware and with the compiler and compiler options you use in your final release code, it's impossible to say.

If the indirect calls do end up being too expensive, you can still hoist the tests out of the loop, by either setting an enum, and using a switch in the loop, or by implementing the loop for each combination of settings, and selecting once at the beginning. (If the functions you point to implement the complete loop, this will almost certainly be faster than testing the condition each time through the loop, even if indirection is expensive.)

Pointer dereferencing overhead vs branching / conditional statements

5 Answers5

Linked