1

I have to call f(0) and f(1).
The parameter (0 and 1) is used only in switch-case(s).
How to force/guide compiler to optimize out the switch-case (from "expensive" to "cheap" version below) whenever possible?

From godbolt demo, switch-case is not optimized out.

Example : expensive

int f(int n) {
    switch(n) {
        case 0: {
            return 5;
        };break;

        case 1: {
            return 10;
        };break;
    }

    return 15;
}

int main(){
    f(0);
}

Example : cheap (my dream)

int f0(){
    return 5;
}

int f1(){
    return 10;
}

int main(){
    f0();
}

More information :-

In real case, there are more than just 0 and 1 - they are enum class.
The parameter is always constant in user's aspect e.g. f(CALLBACK_BEGIN), f(CALLBACK_END).

Why can't I just f0()/f1()?

I want to group it into a single function because I sometimes want to create a pass-through function. It is easier to maintain if I can code it like :-

int g(int n){  .... }
int f(int n){  return g(n); }

It is easier to maintain than :-

int g0(){ .... }    int g1(){ .... }
int f0(){  return g0(); }
int f1(){  return g1(); }

I also prefer to avoid template, so I can't use solution in Optimize Template replacement of a switch. My reasons are :-

  • Template must be implemented in header.
  • I need it to be in .cpp, so I have to indirect it to another non-template function.
    It becomes dirty very fast.

Prematurely optimization?

In my case, it is called 60*10000+ times per second.

Edit

I misunderstood the result of the godbolt demo. It is actually optimized.
(Thank M.M and Benoît for pointing it out.)

Edit2

After receiving both great answers, I tested it and found that Visual C++ is very smart.
It can optimize thing like:-

int f(int p1,int p2){
    if(p1==0 && p2==1){  //zero cost

    }
}
f(0,1);  //inside main

In real case, there are 3-5 layers of function indirection, but Visual C++ can still find!

The result is consistent with a similar post : Constant condition in a loop: compiler optimization

cppBeginner
  • 1,114
  • 9
  • 27
  • Your godbolt shows that the call to f(0) is optimized away – Benoît Sep 05 '17 at 04:15
  • @Benoît Shameful me! I am very new to the compiled language (assembly). – cppBeginner Sep 05 '17 at 04:16
  • Just make sure that f is defined as inline in one of your header. – Benoît Sep 05 '17 at 04:22
  • 1
    @Benoît defining `f` inline in a header is not possible without also putting the function body in the header – M.M Sep 05 '17 at 04:26
  • Obviously. But having it inline (and even constexpr) will allow the compiler to optimize this. – Benoît Sep 05 '17 at 11:38
  • Your choice of bracing and indenting is confusing to me. Why `case 0: { return 5; };break;` instead of simply `case 0: return 5;`? – Barry Sep 05 '17 at 22:16
  • @Barry Agree. It was an attempt to refactor from a more complex code to create MCVE. The full code actually does many things e.g. cache data, then returns void. – cppBeginner Sep 06 '17 at 01:49

3 Answers3

8

An easy solution would be to make your function constexpr, which can ease optimizations a lot.

//  v--- that
constexpr int f(int n) {
    switch(n) {
        case 0: {
            return 5;
        };break;

        case 1: {
            return 10;
        };break;
    }

    return 15;
}

This makes the function callable at compile time. If you pass parameter that are constexpr values, the function can be executed by the compiler in the compilation process. Since you pass enums value as parameter, it's very likely that the function is executed at compile time.

If your heavy function need some runtime values, try to factor out parts that could be marked constexpr, and maybe use template (they really are useful to make code faster!)

constexpr int const_part_of_f(int n) {
    switch(n) {
        case 0: {
            return 5;
        };break;

        case 1: {
            return 10;
        };break;
    }
}

template<int n>
int f() {
    if (get_runtime_value()) {
        // Since `n` is a compile time constant, the result of `const_part_of_f` is
        // evaluated at compile time, even if `f` is not a constexpr function.
        return const_part_of_f(n)
    }

    return 15;
}

If you really want to help the optimizer, avoid excessive memory allocation. For example, if you need an array of a particular size, known at compilation, use std::array instead of std::vector.

As pointed out by other users, the binary bloat was to initialize iostream globals. This however don't deny the fact that constexpr function are more easily optimized by the compiler.

Guillaume Racicot
  • 39,621
  • 9
  • 77
  • 141
  • I just read http://en.cppreference.com/w/cpp/language/constexpr, and tried to find limitation of this approach (`constexpr`), but find none. More complex thing still works OK (http://coliru.stacked-crooked.com/a/826109f921f6089d). Are there any real limitation (beside disability to use "virtual function") that I should know? – cppBeginner Sep 05 '17 at 04:47
  • @cppBeginner you cannot abuse undefined behavior, objects used in constexpr function cannot have a non-trivial destructor and you cannot use the free store (heap allocations). Also, all variable must be initialized at creation. These are the main limitations I know. – Guillaume Racicot Sep 05 '17 at 04:51
  • @cppBeginner also, note that a constexpr function can still be called at runtime. If you pass non constant parameter to a constexpr function, the function will be evaluated at runtime silently. – Guillaume Racicot Sep 05 '17 at 04:54
  • "the function will be evaluated ..." <-- Do you mean (1) the whole function or (2) a certain (smallest possible) part of function? Thank. – cppBeginner Sep 05 '17 at 04:56
  • @cppBeginner well, the compiler is free to apply optimizations and evaluate parts of the function at compile time, but any expression that depend on a runtime value must be evaluated at runtime. If you pass non-constant parameter to a constexpr function, it would behave exactly the same as if the function was not marked constexpr, except that the function may be more easily inlined or optimized. – Guillaume Racicot Sep 05 '17 at 04:59
3

In your demo the call f(0); in main is optimized out as you can see from the assembly for main:

main:
        mov     r0, #0
        bx      lr

The code for f(int) looks pretty optimal already to me, I think it would be less optimal to call a function instead of just issuing one assembly instruction.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Oh no! `_GLOBAL__sub_I__Z1fi:` is not related? – cppBeginner Sep 05 '17 at 04:15
  • @cppBeginner that's initializing the iostreams library (it will go away if you take out `#include `) – M.M Sep 05 '17 at 04:16
  • Thank. By the way, is it *generally* optimized out? i.e. how much can I rely on this type of optimization? – cppBeginner Sep 05 '17 at 04:20
  • 1
    @cppBeginner try it and find out – M.M Sep 05 '17 at 04:22
  • @cppBeginner: You don't write fast code by attempting to micro-optimize up front. Rather, you write the code, profile it, then micro-optimize the parts that end up needing it. In many cases there is little work to do, and your time is better spent writing clean, maintainable code. A small fraction of your program is what's going to need the most attention, so get the whole thing working first so you can identify that fraction accurately. – GManNickG Sep 05 '17 at 19:38
  • @GManNickG First, the program already worked. Then, I profiled and found bottleneck around this small portion. I solved it. After that, I found another bottle around it again. This looped so many times. I concluded that this part is performance-critical. Now, I want to make my code easier to maintain (group `f0()`+`f1()` to `f()`), but I am so scared. I post the question to make sure that it is possible without cause another bottleneck. Do you think it is a bad practice? – cppBeginner Sep 06 '17 at 01:46
  • @cppBeginner Nobody can possible know the answer but you :) That's why M.M said "try it and find out". You simply have to try it and measure. No up-front guessing. – GManNickG Sep 06 '17 at 02:14
1

With template, you may be near of your dream:

template <int N> int f();

template <> int f<0>() { return 5; }
template <> int f<1>() { return 10; }

int main(){
    f<0>();
}

or in C++17, with constexpr if

template <int N> int f()
{
    static_assert(N == 0 || N == 1);

    if constexpr (N == 0) {
        return 5;
    } else if constexpr (N == 1) {
        return 10;
    }
}
Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • @cppBeginner: Implementations can go in cpp if done correctly. – Jarod42 Sep 05 '17 at 11:52
  • Look at *"explicit instantiation"*, you don't need intermediate `g()`. BTW, with full specialization (as in first example), you already have instantiated `f<0>`/`f<1>` even without `main`. – Jarod42 Sep 05 '17 at 13:34
  • Full specializations can indeed be declared in .h, defined in .cpp, exactly because they no longer depend on template parameters. – MSalters Sep 05 '17 at 13:54
  • @MSalters Oh, my bad. I totally forgot about it. Thank Jarod42 and MSalters. XD – cppBeginner Sep 06 '17 at 01:36