11

Please consider the branch prediction too before answering this question.

I have some scenarios where i can replace a conditional statement with a call to a function with the help of function pointer.Some thing like this. (you can think of component based programming over inheritance for a similar type of senario)

     class Shape
     {
        float Area()
        {
            if(type == SQUARE)
             {
                return length*length;
             }
            else if(type == RECTANGLE)
            {
             return length*breadth;
            }
        } 
     }

The same class can be written like this.

       class Shape
     {
        void SetAreaFunction(void *funcptr)//this function is used to set the current AreaFunc
        {
             CurrentAreaFunc = funcptr ;//this holds the pointer to current area func
        }
        float SqauareArea();//this will return square area
        float RectangleArea();//this will return rectangle area 
        float Area()
        {
            currentAreaFunc();
        } 
     }

IF you consider the above cases, both achieves same results.But, I'm thinking about the performance overhead.In the second case I'm avoiding having branch prediction problem by having a function call.

Now let me know which is the better practice and 'better optimized code' in this kind of senarios.(btw, I don't like the statement "Pre-mature optimization is root of all evil" as, optimization has its benefits so i do consider optimizing my code!)

P.S: I don't mind if any one give a detailed overview about 'how bad branch prediction can be" even in assembly code.

Update: After profiling (similar kind of above code),
If Condition succeeded in this kind of senario.Can any one give a reason for this? Functional call code can be prefetched as there is no Branching code right? But here its looks the other way..branching code wins! :O Profiled on Intel Mac Osx,GCC O3/Os optimisation.

Sergey K.
  • 24,894
  • 13
  • 106
  • 174
Ayyappa
  • 1,876
  • 1
  • 21
  • 41
  • 11
    Try both and measure. – fredoverflow Oct 01 '11 at 09:27
  • Why not set up a test case and profile it yourself? – Kerrek SB Oct 01 '11 at 09:28
  • 1
    @FredOverflow : that example is just a senario.I want to know any better practices for this. – Ayyappa Oct 01 '11 at 09:29
  • @KerrekSB: Branch prediction is different on different platforms and architectures.I can't rely on my own test cases. – Ayyappa Oct 01 '11 at 09:31
  • 9
    @Ayyappa: you're really close to the "micro-optimization" domain here. Code first for correctness, readability and maintainability. If you identify that particular function as a bottleneck via profiling, try out variations and stick with the one that works best in that particular spot of that particular program with that compiler on that CPU. – Mat Oct 01 '11 at 09:32
  • @Mat:thankyou. I love the three words u said ! "correctness, readability and maintainability" . btw, the reason behind this question is to know the best practice. – Ayyappa Oct 01 '11 at 09:34
  • @Ayyappa: there is no best practice. This type of optimization is really situation-dependent. Different CPUs have different costs for branching vs indirect function call. There is no "this is always better", the only best practice is to profile your code and react according to that data. – Mat Oct 01 '11 at 09:38
  • 7
    You may not *like* the statement that premature optimization is the root of all evil, but you need to understand the reason behind it. Optimization *can* have benefits - but it can also have *significant* drawbacks in terms of readability. Very often the most efficient code is much more complicated than code which is *efficient enough*. – Jon Skeet Oct 01 '11 at 09:39
  • @JonSkeet : Yeah I agree. I know that i can't argue on your point as well as its truely valid. Sorry that i can't make my point clear in the question i guess.Usually i will try to write better readable code,but sometimes it will not be the case on some slower platforms. – Ayyappa Oct 01 '11 at 09:42
  • You should use else instead of else if. On function-pointers the target is always mispredicted. –  Jun 17 '12 at 14:21

3 Answers3

12

You replaced an if statement with an indirection.

Both your if statement, and the indirection requires memory access.

However, the if will result in a short jump - which will probably won't invalidate the pipeline, while the indirection may invalidate the pipeline.

On the other hand, the indirection is a jump, while the if statement is a conditional jump. The branch predictor may miss.

It is hard to tell which is faster without testing it. I predict that the if statement will win.

Please share your results!

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • Short jump doesn't need the current pre-fetched pipeline to flush even on a branch fail??? – Ayyappa Oct 01 '11 at 10:17
  • As you said if conditional won the race.Pls can you tell me the difference between indirection jump and conditional jump? – Ayyappa Oct 01 '11 at 10:55
  • 2
    An unconditional jump can always be "predicted" and the code can be prefetched. For conditional jump, it is only probabilistic. The probability gets better for new architectures. It is getting quite complex lately. See http://www.bit-tech.net/hardware/cpus/2008/11/03/intel-core-i7-nehalem-architecture-dive/5 for example. – Lior Kogan Oct 01 '11 at 11:18
3

You need to profile such code to be able to make a certain statement for a specific environment (compiler, compiler version, OS, hardware) and you need to measure in a specific application to be able to know whether this even matters for that application. Unless you are writing library code, do not bother except for when profiling has shown this to be a hot spot in your application.

Just write the most readable code, that is easiest to maintain. It's always easier to optimize clean, bug-free, and easily readable code than to fix bugs optimized code.

That said, I remember Lippman in his The C++ Object Model citing research that has found virtual functions (basically function pointers) to be at least as fast as switching over types in real-world applications. I don't know the details, but it's somewhere in the book.

sbi
  • 219,715
  • 46
  • 258
  • 445
3

It is more likely that the optimizer can apply his magic on the if statement, then on a dynamically changing function pointer. Just a guess, theoretically the compiler is allowed to do anything that he can prove does not change the semantics.

But in the case you do not call a function but only implement a branch (using an if in your case) the CPU is more likely to apply its magic, i.e. reorder instructions, prefetching things and the like. If there is an function call in between most CPUs will very likely "flush" their pipelines and CPU-optimization will not be possible.

That said, if you put everything inside a header, caller and callees, the compiler may do away with functions calls, will reorder stuff itself and so on.

Try to measure it yourself. Call it 1M times. With C++11 use <chrono>, monothonic_clock::now().

Update: My experience is: Don't over-optimize your code my hand -- it is very likely that you make it worse. Let the compiler do the work, and for that make as much code visible to it as you can. If you do, you definitely need a profiler, try out alternatives, use the hints some of them offer you. But don't forget: This must be fine-tuned very carefully. Only one measurement of performance is speed. There is also readability, testability and reusebility and many more. And to quote Donald Knuth: "Premature optimization is the root of all evil."

towi
  • 21,587
  • 28
  • 106
  • 187
  • I heard, failing on a branch will lead to flush of pre-fetched functions in the pipeline.Is it not true? Why cant the processor prefetch the function which is not having any branch prediction overhead? - pls clarify this i might be wrong in some concepts. – Ayyappa Oct 01 '11 at 10:09
  • @Ayyappa - It *could*, but this is processor specific. Some big iron *can* prefetch several branches at once in parallel. This requires a memory bandwidth to match though, and an x86 probably doesn't have that. – Bo Persson Oct 01 '11 at 11:18
  • Yes, Bo is right. That is highly CPU specific, but I believe x86 *can*, nowadays. Or at least it guesses and makes predictions. Sometimes these are right, sometimes they are wrong. Thus, one can optimize the ifs, even (with profiling, like someone else pointed out) -- go ahead and profile your code! Really! But one is almost certain: A failed (local) branch is not as bad as a far jump (i.e. function call). The failed branch *might* invalidate the pipeline, where the far-jump, almost certainly *will*. – towi Oct 01 '11 at 14:17
  • Thanks for the info towi...I profiled similar code and updated the results int he question... – Ayyappa Oct 02 '11 at 16:14
  • @Ayy: Hm, no I'd need more data. Profiling means line-by-line timing. I guess you did an overall timing, which is good for a start. No, can't tell you off-hand. Compiler? CPU? Optimization-Level? Line-by-line timing? A good profiling tool that might give you insights might be Intels VTune. It has a trial period I think. – towi Oct 02 '11 at 17:11
  • @towi : I profiled it on intel mac running OSX (gcc compiler - optimisation - O3).I had two cases one for conditional and one for function call,each running for 10000000 times.It gave an rough estimate about which is faster. – Ayyappa Oct 03 '11 at 02:45