1

I have the following nested for-loop:

for(k = 0; k < n; ++k) {
    for(m = 0; m < n; ++m) {
        /* other logic altering a */
        if(a[index] != 0) count++;
    }
}

where a contains uint32_t. Since n can be quite large (but constant), and this is the only branch (besides comparing k and m with n), I would like to optimize this away.

The distribution of zero and non-zero in a can be considered uniformly random.

My first approach was

count += a[index] & 1;

but then count would only be incremented for all odd numbers.

In addition: I also have a case where a contains bool, but according to C++ Conditionals true and false are defined as non-zero and zero, which basically are equivalent to the above problem.

Marco A.
  • 43,032
  • 26
  • 132
  • 246
YnkDK
  • 681
  • 9
  • 26
  • 1
    Do you know that you have a branch? (ie. Did you look at the optimized asm output?). And do you know that this branch is the main performance problem, rather than the left-out calculations? – deviantfan May 16 '15 at 06:58
  • `count += a[index] != 0;` – Beta Carotin May 16 '15 at 07:00
  • I do not. Is there any way to write a comment in the code, which also are found in the assembly? Since this is part of a huge code base, it might be hard to find in asm – YnkDK May 16 '15 at 07:03
  • @BetaCarotin Can I be sure that this evaluates to 1 if the condition holds? And not just some arbitrary non-zero value? – YnkDK May 16 '15 at 07:04
  • @YnkDK Yes, BetaCarotins code is guaranteed to work (standard §4.7 [conv.integral]). But as said, I guess your compiler already did this (at least if it is not VS). About the comments, depends on the compiler. – deviantfan May 16 '15 at 07:06
  • About finding the place, you could use inline asm to insert some nops and search for those. – Beta Carotin May 16 '15 at 07:29
  • @deviantfan It seems that you are correct. At least I'm getting the same assembly code whether or not I'm using the original method or the one proposed by BetaCarotin – YnkDK May 16 '15 at 07:29
  • Micro-optimisation on this level is pointless. Better spend time on working out where the code is slowest – Ed Heal May 16 '15 at 08:22

1 Answers1

3

As stated in the comments for the question if(a[index] != 0) count++; does not produce a branch (in this case), which was somewhat verified in the assembly.

For the sake of the completeness an equivalent to the mentioned statement are count += a[index] != 0; (according to standard §4.7 [conv.integral])

YnkDK
  • 681
  • 9
  • 26