The cos() in math.h run faster than the x86 asm fcos.
The following code is compare between the x86 fcos and the cos() in math.h.
In this code, 1000000 times asm fcos cost 150ms; 1000000 times cos() call cost only 80ms.
How is the fcos implemented in x86? Why is the fcos much slower than cos()?
My enviroment is intel i7-6820HQ + win10 + visual studio 2017.
#include "string"
#include "iostream"
#include<time.h>
#include "math.h"
int main()
{
int i;
const int i_max = 1000000;
float c = 10000;
float *d = &c;
float start_value = 8.333333f;
float* pstart_value = &start_value;
clock_t a, b;
a = clock();
__asm {
mov edx, pstart_value;
fld [edx];
}
for (i = 0; i < i_max; i++) {
__asm {
fcos;
}
}
b = clock();
printf("asm time = %u", b - a);
a = clock();
double y;
for (i = 0; i < i_max; i++) {
start_value = cos(start_value);
}
b = clock();
printf("math time = %u", b - a);
return 0;
}
According to my personal understanding, a single asm instruction is usually faster than a function call. Why in this case the fcos so slow?
Update: I have run the same code on another laptop with i7-6700HQ. On this laptop the 1000000 times fcos cost only 51ms. Why there is such a big difference between the two cpus.