2

In a code using pow(double x, double p) (a big part of the cases have p = 2.0) I observed than the execution of my code is clearly faster when p = 2.0 than when p = 2.000000001. I conclude that, on my compiler (gcc 4.8.5), the implementation of pow detects when it's a square at runtime.

Following this observation, I conclude that I don't need a specific implementation when I know that p is 2. But my code must be cross-platform, then my question:

Is pow optimized when the exponent is an integer in most of the c++03 compilers ?

In my current context, "most of the compiler" = "gcc >= 4.8, intel with msvc, intel on unix"

Caduchon
  • 4,574
  • 4
  • 26
  • 67
  • 2
    Well, check [for each compiler on godbolt](https://godbolt.org/z/GjYoUP) and find out for yourself. Looks like `pow(, 2)` is replaced by a single `mul`. – KamilCuk Apr 22 '20 at 07:02
  • 3
    You can see here (https://godbolt.org/z/t5UvB6) that your version of GCC tries to remove the call to pow if it knows the exponent at compile time. When the exponent is 2, this means replacing it with a multiplication. – Tharwen Apr 22 '20 at 07:03
  • If you know in advance that a big part of your use cases are simple squares, you could explicitly define a function like `double square(double x) { return x * x; }`, to be sure. – Bob__ Apr 22 '20 at 07:12
  • 2
    I didn't find such a requirement in the standard. – Thomas Sablik Apr 22 '20 at 07:12
  • @Tharwen nice tool, but it detect explicit call with a constant. Y my case, I have the value `2.0` in a `double` variable I give to the function pow. Then pow is always called, but really faster when the value in the variable is `2.0`. – Caduchon Apr 22 '20 at 07:19
  • In fact you can put in an integer exponent up to MAXINT 2147483647 and it will inline it as a series of multiplications. Put in (MAXINT+1) and it calls the pow function. Also you can only use *int* constants. long or unsigned int etc just call the pow function. – bradgonesurfing Apr 22 '20 at 07:22
  • 1
    And yes, it also works with constant (https://godbolt.org/z/vS556A) if that was a question. It's a simple optimization, that both gcc and clang do. Msvc most likely too (didn't work on godbolt today, so I couldn't test it). – Lukas-T Apr 22 '20 at 07:26
  • @ThomasSablik has the right of it. This is guaranteed by no one. I've seen `pow(5,2)` return 24, so even if it somehow wound up faster, the sucker was wrong. I recommend sticking to integer multiplication if you want iron-clad guarantees. – user4581301 Apr 22 '20 at 07:31
  • @Caduchon: due to the way floating point numbers are usually represented (IEEE 754), checking if the exponent is a power of two or integer and then optimizing for this case might make sense. Especially since `pow` is very likely to be used with small integers like `2` or `3`. – vgru Apr 22 '20 at 07:39
  • 1
    "faster?" is actually the wrong question. `pow` is for floating points and you can get surprising results when used with integers (see eg [here](https://stackoverflow.com/questions/25678481/why-does-pown-2-return-24-when-n-5-with-my-compiler-and-os?noredirect=1&lq=1)). Conclusion: if you want to square numbers write `x*x` – 463035818_is_not_an_ai Apr 22 '20 at 07:39
  • When you say "most" compilers, do you have specific compilers in mind? If you listed each one, there would be a clear answer to this question. – VLL Apr 22 '20 at 07:39
  • I write an industrial soft mainly running on supercomputers and personnal laptops, the used compilers can change with every new client we have. For the moment, we use gcc 4.*, gcc 8.*, intel with msvc, intel on unix. In my experience, there is a lot of thing not in the standard, but clearly available for all the common compilers on computers (I don't have to compile for satellites, nuclear units, or unix kernel written in javascript :-p ) – Caduchon Apr 22 '20 at 07:51
  • 2
    If you look at the pow implementation in glibc ( current version ) there is in fact a runtime optimization for natural numbers. https://github.com/bminor/glibc/blob/92b963699aae2da1e25f47edc7a0408bf3aee4d2/sysdeps/i386/fpu/e_pow.S#L117 You would have to go through the versions of glibc used by your specific version to confirm if this is always applied. This is also only the i386 version so other architectures may differ. – bradgonesurfing Apr 22 '20 at 08:29

2 Answers2

0

Yes the standard libraries do attempt to do runtime optimization if the exponent is detected to be a natural number. Looking at the current version glibc i386 version of POW you can find the following code.

    /* First see whether `y' is a natural number.  In this case we
       can use a more precise algorithm.  */
    fld %st     // y : y : x
    fistpll (%esp)      // y : x
    fildll  (%esp)      // int(y) : y : x
    fucomp  %st(1)      // y : x
    fnstsw
    sahf
    jne 3f

embedded in the implementation. The full code can be found at github.

Note that for other versions of glibc and other architectures the answer may differ.

bradgonesurfing
  • 30,949
  • 17
  • 114
  • 217
-2

EDIT

The answer below mininterprets the OP's question which was specifically about RUNTIME optimisation whereas I investigated compile time optimisation.

Original Answer

Adding to my comment. As long at the exponent is a contant int less than or equal to MAXINT then you get.

#include <cmath>

double pow(double a)
{
    return std::pow(a, (int)2147483647);
}

generates

pow(double):
        movapd  xmm4, xmm0
        mulsd   xmm4, xmm0
        movapd  xmm5, xmm4
        mulsd   xmm5, xmm4
        mulsd   xmm4, xmm0
        movapd  xmm6, xmm5
        mulsd   xmm4, xmm5
        mulsd   xmm6, xmm5
        movapd  xmm3, xmm6
        mulsd   xmm3, xmm6
        mulsd   xmm3, xmm0
        movapd  xmm0, xmm4
        movapd  xmm2, xmm3
        movapd  xmm1, xmm3
        mulsd   xmm2, xmm6
        mulsd   xmm1, xmm3
        mulsd   xmm2, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm2
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm4
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm1
        mulsd   xmm0, xmm1
        ret

but you have to be careful to use an int literal

#include <cmath>

double pow(double a)
{
    return std::pow(a, (unsigned int) 2147483647);
}

generates

pow(double):
        movsd   xmm1, QWORD PTR .LC0[rip]
        jmp     pow
.LC0:
        .long   4290772992
        .long   1105199103

EDIT

I seem to be wrong. The above was tested with an early version of GCC. In early versions of GCC and CLANG the multiplication is inlined. However in later versions this does not happen. It is possible that newer versions of If you switch the versions on godbolt then you see that the above DOES NOT OCCUR.

For example

#include <cmath>

double pow_v2(double a)
{
    return std::pow(a, 2);
}

double pow_v3(double a)
{
    return std::pow(a, 3);
}

for CLANG 10.0 generates

pow_v2(double):                             # @pow_v2(double)
        mulsd   xmm0, xmm0
        ret
.LCPI1_0:
        .quad   4613937818241073152     # double 3
pow_v3(double):                             # @pow_v3(double)
        movsd   xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero
        jmp     pow                     # TAILCALL

but for CLANG 5.0 it generates

pow_v2(double):                             # @pow_v2(double)
        mulsd   xmm0, xmm0
        ret
pow_v3(double):                             # @pow_v3(double)
        movapd  xmm1, xmm0
        mulsd   xmm1, xmm1
        mulsd   xmm1, xmm0
        movapd  xmm0, xmm1
        ret

It seems that for later versions of the compilers the intrinsic pow function is faster to call than inlining the multiplications so the compilers change their strategy.

bradgonesurfing
  • 30,949
  • 17
  • 114
  • 217
  • 1
    this is a compile-time constant. The OP is asking about runtime integer exponents – phuclv Apr 22 '20 at 07:28
  • I only use `pow(double,double)` and the second member is never a constant value. But most of the time, the effective value is 2.0. – Caduchon Apr 22 '20 at 07:33
  • The OP writes. ``I conclude that, on my compiler (gcc 4.8.5), the implementation of pow detects when it's a square.`` If the compiler is detecting something then it is a constant. It is also possible at runtime that the pow intrinsic function has different behaviour depending on the detected exponent. – bradgonesurfing Apr 22 '20 at 07:39
  • @bradgonesurfing, I didn't say "the compiler detect something", but "the implementation of pow detect something" ;-) My question concern runtime detection. – Caduchon Apr 22 '20 at 07:53
  • Sorry. My fault. I interpreted your statement incorrectly. Thankyou for the clarification. It would be nice if Godbolt inlined the *pow* implementation. Then you would be able to see what it does. – bradgonesurfing Apr 22 '20 at 08:11
  • 1
    I've updated my answer to show it was a misunderstanding. – bradgonesurfing Apr 22 '20 at 08:17
  • @bradgonesurfing If your answer is wrong, delete it. – VLL Apr 23 '20 at 08:35