Is there a way to speed up the evaluation of the following expression?

Question

I have profiled my program and it spends 20% of the CPU time basically evaluating the following expression:

abs(x) > abs(y)

where x,y are double-precision floating point variables.

Is there a way to refactor the expression to a faster variant?

The following line (called in two different places) takes close to 10% CPU time at each line:

(This is a snippet from the function Image_3::TestGradientAtPoint)

       if (abs(maxx[ch]) < abs(a)) maxx[ch] = a; 
01187AC9  mov         eax,dword ptr [ch]  
01187ACC  sub         esp,8  
01187ACF  fld         qword ptr [ebp+eax*8-68h]  
01187AD3  fstp        qword ptr [esp]  
01187AD6  call        abs (11305F9h)  
01187ADB  fld         qword ptr [ebp-70h]  
01187ADE  fstp        qword ptr [esp]  
01187AE1  fstp        qword ptr [ebp-0F8h]  
01187AE7  call        abs (11305F9h)  
01187AEC  add         esp,8  
01187AEF  fcomp       qword ptr [ebp-0F8h]  
01187AF5  fnstsw      ax  
01187AF7  test        ah,41h  
01187AFA  jne         Image_3::TestGradientAtPoint+176h (1187B06h)
01187AFC  mov         eax,dword ptr [ch]  
01187AFF  fld         qword ptr [ebp-70h]  
01187B02  fstp        qword ptr [ebp+eax*8-68h]

The profiler has stated that the call to abs() has taken 20% CPU time. I am calling the method on the order of 10^8 iterations - I am working with large images.

Edit

I forgot to say that but the code is runs in Debug mode, and I need to optimize it here a bit, because I want to still be able to use the MSVC debugger in reasonable time.

You may want to post the code around this statement, because this I doubt is the real problem. — Tony The Lion, Mar 05 '13 at 17:30
Tony and Martin, I have updated my question. @aleguna: I am specifically asking if there is a way to optimize this operation. — Boyko Perfanov, Mar 05 '13 at 17:34
@perfanoff, Even if there is, your compiler (assuming its not 100 years old) will do the optimisation for you. There **could** be a way to reduce number of times you perform this operation, but we need to see the code to tell. — , Mar 05 '13 at 17:38
@aleguna - or debugger - maybe it's halting on wrong line? I've been misled by that before :( — Martin James, Mar 05 '13 at 17:38
That snippet of code can be compiled in a variety of ways (from good to piss-poor). Can you show the disassembly? I'd like to see which it is. — Mysticial, Mar 05 '13 at 17:41
Need more context. What is the loop in which this line is executed? Maybe one or both calls to `abs` can be hoisted. — Raymond Chen, Mar 05 '13 at 17:43
Ah... Yes, that's a pretty ugly assembly dump. Not only is there a branch, but it it's got non-inlined function calls to `abs()`. Is this your own `abs()` function? Most compilers should be treating it as a primitive and will inline it. — Mysticial, Mar 05 '13 at 17:50
@Mysticial, I have added the disassembly - I am wondering exactly if this statement can be optimized by itself. — Boyko Perfanov, Mar 05 '13 at 17:51
The statement can be optimized very well if you use SSE. Try enabling it in the compiler and see if it does it for you. If not, you'll have to resort to intrinsics. Oh, and did you enable optimizations in the first place? — Mysticial, Mar 05 '13 at 17:52
@Mysticial this is in Debug mode, but unfortunately I need Debug mode to be quick, because it takes me 10 minutes to get to the breakpoint where I need to debug. PS I am sorry that I neglected the important fact; this is in debug mode. — Boyko Perfanov, Mar 05 '13 at 17:53
Measuring performance without optimizations turned on… sigh. — , Mar 05 '13 at 17:54
@perfanoff If this is debug mode, then forget it... I can't really say much more than to learn to debug through optimizations. — Mysticial, Mar 05 '13 at 17:57
This might be a stupid question but would casting to unsigned then comparing be faster? — andre, Mar 05 '13 at 17:59
Turn optimization on for the whole project, then exclude the parts you need to debug from being optimized. — Retired Ninja, Mar 05 '13 at 18:00
@perkanoff - Running in debug mode is telling the compiler "Please don't make *any* attempts to make this run fast". And that's what you get. — Bo Persson, Mar 05 '13 at 18:03
Voting to close as too localized; asking to optimize a purposefully unoptimized build, probably not helpful to anyone else and arguably not even OP. — GManNickG, Mar 05 '13 at 18:53

SGM1 · Answer 1 · 2013-03-05T19:39:50.427

6

This may not be faster but if arithmetic expressions are evaluated faster:

if ((x - y) * (-x - y) < 0)
    // then abs(x) > abs(y)

I believe this fixes the number of expressions to 3 (the 2 arithmetic expression and the compare to zero) rather than the 3 expression from abs method (each abs checks if negative, inverts sign else just return value then compare each abs)

EDIT:

As andre said, you could always explicitly square the floats. Make much more sense in retrospect.

if (x * x > y * y)
    // then abs(x) > abs(y)

Since (x-y)(-x-y) = y^2-x^2

edited Mar 05 '13 at 19:39

answered Mar 05 '13 at 18:06

SGM1

968
2
12
23

I will test it and let you know. – Boyko Perfanov Mar 05 '13 at 18:08
Interesting... I think you might be off by a sign somewhere. If both are positive and `x > y`, it evaluates to negative which is `false`. – Mysticial Mar 05 '13 at 18:10
2

I was thinking of suggesting comparing `x*x > y*y` but if the number are big multiplication can cause an overflow. – andre Mar 05 '13 at 18:14
Just edited, mixed sign of expression – SGM1 Mar 05 '13 at 18:15
+1, I think it works now. Neat little trick. :) – Mysticial Mar 05 '13 at 18:16
I implemented it as an inline function, so I can profile it. There is a definite speed improvement, after converting only the explicit calls to abs() in a single function, the function has speeded up considerably (from 44s to 37s). According to a very cursory look at the profiler output, the CPU speed of the implementation has increased to 250%. Thank you. – Boyko Perfanov Mar 05 '13 at 19:15
@andre IDK why I sought this convoluted way when your way is probably better and makes much more sense – SGM1 Mar 05 '13 at 19:33

score 5 · Answer 2 · 2013-03-05T18:31:05.473

5

Tell your compiler to optimize. In GCC or clang you do this using -O2 or -O3 flags—the latter is more aggressive. In MSVC you can use the /O2 or /Ox flags (IIRC; I rarely use that compiler). You can't expect 100000000 iterations to run quickly without optimizations turned on.

If you want to debug without optimizations turned on, but still within a reasonable time frame, try a smaller data set; or as Mysticial mentioned, debug with optimizations turned on and accept randomly changing values and other arcane observations in your debugger.

edited Mar 05 '13 at 18:31

answered Mar 05 '13 at 17:56

Thank you for your outlook on the situation. – Boyko Perfanov Mar 05 '13 at 18:03

score 4 · Answer 3 · answered Mar 05 '13 at 17:42

4

Well if the order of maxx[] its not important you could sort it and i think it would be faster.

Other thing is that if "a" its the same for all maxx[] you could do a= abs(a); and then just compare to a directly.

I would need to see more code to try to help you more.

answered Mar 05 '13 at 17:42

Luis Tellez

2,785
1
20
28

Thank you, this is a good answer, but unfortunately the order of maxx[] cannot be changed. – Boyko Perfanov Mar 05 '13 at 17:50

D Stanley · Answer 4 · 2013-03-05T17:56:00.980

2

This may not be any faster but another logical version would be:

// logical replacement for abs(x) > abs(y)
x >= 0 ? 
    y >= 0 && x > y :
    y <= 0 && x < y ;

Since it's only using comparators and branches it may be faster, but no guarantees...

If not, try using fabs instead since it's designed for floating-point numbers.

edited Mar 05 '13 at 17:56

answered Mar 05 '13 at 17:50

D Stanley

149,601
11
178
240

`abs` is overloaded for all numeric types. – James Kanze Mar 05 '13 at 19:52
@JamesKanze I understand, but there _may_ be optimizations with `fabs` that make it faster for floating point types. I don't know either way - I'm just pointing it out as an alternative. – D Stanley Mar 05 '13 at 22:59
It seems highly unlikely that `fabs` is in any way different than the overload of `abs`. In fact, because the overloads are pure C++, where as `fabs` is from C, there's a distinct possibility that the overloads are `inline`, whereas `fabs` isn't. – James Kanze Mar 06 '13 at 09:15

score 2 · Answer 5 · answered Mar 05 '13 at 17:52

The first thing to check is that you're actually enabling optimizations. If not, perhaps your compiler isn't inlining the call, resulting in enough overhead that you notice it.

If as I suspect you actually have optimization enabled you're going to have to take an algorithmic approach. I can't think of anything you could do do abs to make it faster.

So you need to consider things like:

Do you care about the original negative numbers or can you pre-filter the data to absify it?
Do you care about the order? Can you sort the data to improve your algorithm
Are you computing the abs of a non-changing value many times in a loop?

You are correct, the call to abs() is not inlined. I will check whether to and how to inline it without having to add compiler optimizations that will interfere with the rest of the program. — Boyko Perfanov, Mar 05 '13 at 18:02

Is there a way to speed up the evaluation of the following expression?

Edit

5 Answers5