2

I need to run a function with N boolean variables, I want to make them constexpr in order to exterminate comparisons and save the code from branch prediction failure.

What I mean is:

templateFunc<b1, b2, b3, b4 ...>(args...);

as the b1..bn variables are just boolean variables and may have only 2 states, I could write something like this:

if (b1 && b2)
  templateFunc<true, true>(args...);
else if (b1 && !b2)
  templateFunc<true, false>(args...);
else if (!b1 && b2)
  templateFunc<false, true>(args...);
else
  templateFunc<false, false>(args...);

The problem is obvious, I'd need 64 calls for 5 variables.. Any solution?

Jarod42
  • 203,559
  • 14
  • 181
  • 302
vamirio-chan
  • 339
  • 1
  • 13
  • The boolean values are currently not `constexpr`? If not, this approach just moves the comparison somewhere else - does not eliminate branch prediction failure. BTW, you measured that this is bottleneck in your code? – Quimby Aug 24 '21 at 07:43
  • no, they are not known on a compile time unfortunately. yeah, this is the bottleneck, a loop that is to be expected to go through millions of iterations and be called quite frequently – vamirio-chan Aug 24 '21 at 07:45
  • In that case, you are just moving the comparisons out of `run`, that is only useful if you compare more than once per `run`. Well, the compilers can move independent checks out of the loops or propagate constants through calls if you enable optimizations. – Quimby Aug 24 '21 at 07:47
  • Can you please tell me which optimizations exactly? I enabled vectorization (not the case here though?) and -O3. Run a code for constexpr and just if() and the difference was 100 times. The exact same code where one call uses templates and another one doesn't. – vamirio-chan Aug 24 '21 at 08:20
  • That is not a fair comparison, right? No optimization can eliminate the comparison depending on runtime values, but it can move it out of the loop if it does not depend on it. But not sure about specific flags for Arduino's compiler. – Quimby Aug 24 '21 at 08:32

1 Answers1

8

With std::variant (C++17), you might do the dynamic dispatch via std::visit:

// helper
std::variant<std::false_type, std::true_type> to_boolean_type(bool b)
{
    if (b) return std::true_type{};
    return std::false_type{};
}

and then

std::visit([&](auto... bs){templateFunc<bs...>(args...);},
           to_boolean_type(b1), to_boolean_type(b2));

Demo

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • Does this actually make the generated asm more efficient, or is this just getting the compiler to invent the branching for you with fewer source lines? It has to branch somehow to call different versions of the template (or jump to inlined versions of them), but might be using a jump table instead of a tree of branches, which might or might not be better optimized. – Peter Cordes Aug 24 '21 at 13:25
  • It does work. I checked the assembler code and it had 8 different function with expected return values for a function with 3 boolean variables. (also all if statements were constexpr so it makes sense even on compile time) – vamirio-chan Aug 24 '21 at 17:42
  • @PeterCordes: branching can happen outside of loop, for example `for (auto e : v) { if (cond) { foo(e); } else { bar(e); }}` versus `if (cond) {for (auto e : v) { foo(e); } } else { for (auto e : v) { bar(e); } }` (whereas my example might be transformed normally with some optimization, similar code might be harder to optimize for compiler). – Jarod42 Aug 24 '21 at 20:22
  • Right, I misinterpreted the question the first time. I thought it was hoping to eliminate *all* branching, rather than just save typing vs. manually dispatching to one of 2^5 versions of a function / loop (to manually hoist the branching out of the loop). So this being equivalent to that is what you want, with hopefully no extra overhead. A tree of conditional branches vs. one indirect branch outside the loop might still matter if it's quite short, but the major goal is still being accomplished. – Peter Cordes Aug 24 '21 at 20:50