0

I have profiled my program and it spends 20% of the CPU time basically evaluating the following expression:

abs(x) > abs(y)

where x,y are double-precision floating point variables.

Is there a way to refactor the expression to a faster variant?

The following line (called in two different places) takes close to 10% CPU time at each line:

(This is a snippet from the function Image_3::TestGradientAtPoint)

       if (abs(maxx[ch]) < abs(a)) maxx[ch] = a; 
01187AC9  mov         eax,dword ptr [ch]  
01187ACC  sub         esp,8  
01187ACF  fld         qword ptr [ebp+eax*8-68h]  
01187AD3  fstp        qword ptr [esp]  
01187AD6  call        abs (11305F9h)  
01187ADB  fld         qword ptr [ebp-70h]  
01187ADE  fstp        qword ptr [esp]  
01187AE1  fstp        qword ptr [ebp-0F8h]  
01187AE7  call        abs (11305F9h)  
01187AEC  add         esp,8  
01187AEF  fcomp       qword ptr [ebp-0F8h]  
01187AF5  fnstsw      ax  
01187AF7  test        ah,41h  
01187AFA  jne         Image_3::TestGradientAtPoint+176h (1187B06h)
01187AFC  mov         eax,dword ptr [ch]  
01187AFF  fld         qword ptr [ebp-70h]  
01187B02  fstp        qword ptr [ebp+eax*8-68h]  

The profiler has stated that the call to abs() has taken 20% CPU time. I am calling the method on the order of 10^8 iterations - I am working with large images.

Edit

I forgot to say that but the code is runs in Debug mode, and I need to optimize it here a bit, because I want to still be able to use the MSVC debugger in reasonable time.

Boyko Perfanov
  • 3,007
  • 18
  • 34

5 Answers5

6

This may not be faster but if arithmetic expressions are evaluated faster:

if ((x - y) * (-x - y) < 0)
    // then abs(x) > abs(y)

I believe this fixes the number of expressions to 3 (the 2 arithmetic expression and the compare to zero) rather than the 3 expression from abs method (each abs checks if negative, inverts sign else just return value then compare each abs)

EDIT:

As andre said, you could always explicitly square the floats. Make much more sense in retrospect.

if (x * x > y * y)
    // then abs(x) > abs(y)

Since (x-y)(-x-y) = y^2-x^2

SGM1
  • 968
  • 2
  • 12
  • 23
  • I will test it and let you know. – Boyko Perfanov Mar 05 '13 at 18:08
  • Interesting... I think you might be off by a sign somewhere. If both are positive and `x > y`, it evaluates to negative which is `false`. – Mysticial Mar 05 '13 at 18:10
  • 2
    I was thinking of suggesting comparing `x*x > y*y` but if the number are big multiplication can cause an overflow. – andre Mar 05 '13 at 18:14
  • Just edited, mixed sign of expression – SGM1 Mar 05 '13 at 18:15
  • +1, I think it works now. Neat little trick. :) – Mysticial Mar 05 '13 at 18:16
  • I implemented it as an inline function, so I can profile it. There is a definite speed improvement, after converting only the explicit calls to abs() in a single function, the function has speeded up considerably (from 44s to 37s). According to a very cursory look at the profiler output, the CPU speed of the implementation has increased to 250%. Thank you. – Boyko Perfanov Mar 05 '13 at 19:15
  • @andre IDK why I sought this convoluted way when your way is probably better and makes much more sense – SGM1 Mar 05 '13 at 19:33
5

Tell your compiler to optimize. In GCC or clang you do this using -O2 or -O3 flags—the latter is more aggressive. In MSVC you can use the /O2 or /Ox flags (IIRC; I rarely use that compiler). You can't expect 100000000 iterations to run quickly without optimizations turned on.

If you want to debug without optimizations turned on, but still within a reasonable time frame, try a smaller data set; or as Mysticial mentioned, debug with optimizations turned on and accept randomly changing values and other arcane observations in your debugger.

4

Well if the order of maxx[] its not important you could sort it and i think it would be faster.

Other thing is that if "a" its the same for all maxx[] you could do a= abs(a); and then just compare to a directly.

I would need to see more code to try to help you more.

Luis Tellez
  • 2,785
  • 1
  • 20
  • 28
2

This may not be any faster but another logical version would be:

// logical replacement for abs(x) > abs(y)
x >= 0 ? 
    y >= 0 && x > y :
    y <= 0 && x < y ;

Since it's only using comparators and branches it may be faster, but no guarantees...

If not, try using fabs instead since it's designed for floating-point numbers.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • `abs` is overloaded for all numeric types. – James Kanze Mar 05 '13 at 19:52
  • @JamesKanze I understand, but there _may_ be optimizations with `fabs` that make it faster for floating point types. I don't know either way - I'm just pointing it out as an alternative. – D Stanley Mar 05 '13 at 22:59
  • It seems highly unlikely that `fabs` is in any way different than the overload of `abs`. In fact, because the overloads are pure C++, where as `fabs` is from C, there's a distinct possibility that the overloads are `inline`, whereas `fabs` isn't. – James Kanze Mar 06 '13 at 09:15
2

The first thing to check is that you're actually enabling optimizations. If not, perhaps your compiler isn't inlining the call, resulting in enough overhead that you notice it.

If as I suspect you actually have optimization enabled you're going to have to take an algorithmic approach. I can't think of anything you could do do abs to make it faster.

So you need to consider things like:

  • Do you care about the original negative numbers or can you pre-filter the data to absify it?
  • Do you care about the order? Can you sort the data to improve your algorithm
  • Are you computing the abs of a non-changing value many times in a loop?
Mark B
  • 95,107
  • 10
  • 109
  • 188
  • You are correct, the call to abs() is not inlined. I will check whether to and how to inline it without having to add compiler optimizations that will interfere with the rest of the program. – Boyko Perfanov Mar 05 '13 at 18:02