1

I am building a spacial octree. In order to determine in which branch/octant a certain point (x,y,z) should be placed, I use this function:

if (x>x_centre) {
    xsign = 1;
}
else {
    xsign = 0;
}

if (y>y_centre) {
    ysign = 1;
}
else {
    ysign = 0;
}

if (z>z_centre) {
    zsign = 1;
}
else {
    zsign = 0;
}

return xsign + 2*ysign + 4*zsign;

It returns a number between 0 and 7 unique for every octant. It turns out this snippet is called a big many times. It gets quite time consuming when building large trees.

Is there any easy way to speed this proces up?

This allready gives a 30 percent speed up:

xsign = x>x_centre;
ysign = y>y_centre;
zsign = z>y_centre;

return xsign + 2*ysign + 4*zsign;

Any other tips?

Re Captcha
  • 3,125
  • 2
  • 22
  • 34
renger
  • 775
  • 1
  • 7
  • 11
  • use of ternary operator may be an option – Ali Kazmi May 21 '14 at 08:33
  • 1
    Two things: You can mark the function as `inline` hoping that the compiler will inline the code at the call-site. Secondly remember that `true` is equivalent to `1` and `false` is equivalent to `0`. The last bit will help you discard the `if` statements. – Some programmer dude May 21 '14 at 08:34
  • If I have to place lets say ten million items in my tree, the average depth is log(ten million)/log(8) = 8. So I inevitably have to call this function eighty million times. What do you mean by mispredictions? – renger May 21 '14 at 08:39
  • @JoachimPileborg, thanks! That allready gave an improvement of 30 percent. – renger May 21 '14 at 08:44
  • instead of using multiplies, use shifts: 2 * a = a << 1, 4 * a = a << 2, etc. (only works with +ve numbers) –  May 21 '14 at 08:48
  • @renger you also removed the branches with the new code. – concept3d May 21 '14 at 08:49
  • @renger You removed the if statements, which I suspect plays a good part in the 30% improvement. But only profiling can tell http://en.wikipedia.org/wiki/Branch_predictor – concept3d May 21 '14 at 08:56
  • 1
    That final version might be as good as it gets. Have a look at the assembly code the compiler is creating (and post it here). – Skizz May 21 '14 at 08:56
  • @concept3d: There is a chance that the compiler has got rid of the branches entirely - for example, on a IA32 there is a setg instruction which sets a byte to 0 or 1 depending on the state of the flags. – Skizz May 21 '14 at 08:59
  • @Skizz, I think you are right. I am using Dev-C++ for Windows, and have no clue how to acces the assembly code :( – renger May 21 '14 at 09:01
  • @Skizz Good to know, I think as you said showing the assembly will tell if this code can be improved. – concept3d May 21 '14 at 09:04
  • @renger: One way to find the compiler output is to run the code in a debugger, place a breakpoint on the code in question, when execution stops, the debugger should have an option to view the assembly. – Skizz May 21 '14 at 09:10
  • `return (x>x_centre ? 1 : 0) | (y>y_centre ? 2 : 0) | (z>z_centre ? 4 : 0)` might be faster – scones May 21 '14 at 09:48

0 Answers0