5
// if I know that in_x will never be bigger than Max
template <unsigned Max>
void foo(unsigned in_x)
{
    unsigned cap = Max;

    // I can tell the compiler this loop will never run more than log(Max) times
    for (; cap != 0 && in_x != 0; cap >>= 1, in_x >>= 1)
    {
    }
}

As shown in the above code, my guess is that if I just write

for (; in_x != 0; in_x >>= 1)

the compiler won't unroll the loop, for it cannot be sure about the maximum possible in_x.

I wish to know if I'm right or wrong, and if there are some better ways to deal with such things.


Or maybe the problem can be generalized as if one can write some code to tell the compiler the range of some run-time value, and such code is not necessarily be compiled into the run-time binary.


Truly, fighting with the compiler XD

// with MSC
// if no __forceinline here, unrolling is ok, but the function will not be inlined
// if I add __forceinline here, lol, the entire loop is unrolled (or should I say the tree is expanded)...
// compiler freezes when Max is something like 1024
template <int Max>
__forceinline void find(int **in_a, int in_size, int in_key)
{
    if (in_size == 0)
    {
        return;
    }

    if (Max == 0)
    {
        return;
    }

    {
        int m = in_size / 2;

        if ((*in_a)[m] >= in_key)
        {
            find<Max / 2>(in_a, m, in_key);
        }
        else
        {
            *in_a = *in_a + m + 1;

            find<Max - Max / 2 - 1>(in_a, in_size - (m + 1), in_key);
        }
    }
}
BlueWanderer
  • 2,671
  • 2
  • 21
  • 36
  • 8
    These kind of micro-optimization things tend to be very hit-and-miss when you're fighting with the compiler. I almost always end up manually unrolling, but I've never tried to prevent the compiler from automatic unrolling. – Mysticial Mar 03 '12 at 08:25
  • Even if a compiler knows `Max` and computes `log(Max)`, suppose that value is large enough that unrolling is a bad idea. Well, one may argue that maybe the compiler could use a threshold. But that threshold might depend on the specific loop. Performance prediction is a hard compiler problem. – Jerry Mar 03 '12 at 11:42
  • @Jerry log(Max) will be <= 32 (unsigned, not that large). – J.N. Mar 03 '12 at 12:49

1 Answers1

3

The proper way to achieve this kind of behavior is to un roll the loop yourself using TMP. Even with this, you'll be relying on the compiler cooperation for massive inlining (which is not granted). Have a look at the following code to see if it helps:

template <unsigned char MaxRec>
inline void foo(unsigned in_x)
{
    if (MaxRec == 0) // will be eliminated at compile time
        return; // tells the compiler to stop the pseudo recursion

    if (in_x == 0) {
        // TODO : end recursion;
        return;
    };

    // TODO: Process for iteration rec

    // Note: NOT recursion, the compiler would not be able to inline
    foo<MaxRec-1>(in_x >> 1);
}

// Usage:
foo<5>(in_x); // doubt the compiler will inline 32 times, but you can try.
J.N.
  • 8,203
  • 3
  • 29
  • 39
  • 1
    MSC will unroll a binary search with depth below 5 this way, without inlining outer most foo though... If forced it will inline both branch instead of unrolling loop. – BlueWanderer Mar 03 '12 at 17:37