EDIT
The answer below mininterprets the OP's question which was specifically about RUNTIME optimisation whereas I investigated compile time optimisation.
Original Answer
Adding to my comment. As long at the exponent is a contant int less than or equal to MAXINT then you get.
#include <cmath>
double pow(double a)
{
return std::pow(a, (int)2147483647);
}
generates
pow(double):
movapd xmm4, xmm0
mulsd xmm4, xmm0
movapd xmm5, xmm4
mulsd xmm5, xmm4
mulsd xmm4, xmm0
movapd xmm6, xmm5
mulsd xmm4, xmm5
mulsd xmm6, xmm5
movapd xmm3, xmm6
mulsd xmm3, xmm6
mulsd xmm3, xmm0
movapd xmm0, xmm4
movapd xmm2, xmm3
movapd xmm1, xmm3
mulsd xmm2, xmm6
mulsd xmm1, xmm3
mulsd xmm2, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm2
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm4
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm1, xmm1
mulsd xmm0, xmm1
ret
but you have to be careful to use an int literal
#include <cmath>
double pow(double a)
{
return std::pow(a, (unsigned int) 2147483647);
}
generates
pow(double):
movsd xmm1, QWORD PTR .LC0[rip]
jmp pow
.LC0:
.long 4290772992
.long 1105199103
EDIT
I seem to be wrong. The above was tested with an early version of GCC. In early versions of GCC and CLANG the multiplication is inlined. However in later versions this does not happen. It is possible that newer versions of If you switch the versions on godbolt then you see that the above DOES NOT OCCUR.
For example
#include <cmath>
double pow_v2(double a)
{
return std::pow(a, 2);
}
double pow_v3(double a)
{
return std::pow(a, 3);
}
for CLANG 10.0 generates
pow_v2(double): # @pow_v2(double)
mulsd xmm0, xmm0
ret
.LCPI1_0:
.quad 4613937818241073152 # double 3
pow_v3(double): # @pow_v3(double)
movsd xmm1, qword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero
jmp pow # TAILCALL
but for CLANG 5.0 it generates
pow_v2(double): # @pow_v2(double)
mulsd xmm0, xmm0
ret
pow_v3(double): # @pow_v3(double)
movapd xmm1, xmm0
mulsd xmm1, xmm1
mulsd xmm1, xmm0
movapd xmm0, xmm1
ret
It seems that for later versions of the compilers the intrinsic pow function is faster to call than inlining the multiplications so the compilers change their strategy.