I have a code like that:
const uint64_t tsc = __rdtsc();
const __m128 res = computeSomethingExpensive();
const uint64_t elapsed = __rdtsc() - tsc;
printf( "%" PRIu64 " cycles", elapsed );
In release builds, this prints garbage like “38 cycles” because VC++ compiler reordered my code:
const uint64_t tsc = __rdtsc();
00007FFF3D398D00 rdtsc
00007FFF3D398D02 shl rdx,20h
00007FFF3D398D06 or rax,rdx
00007FFF3D398D09 mov r9,rax
const uint64_t elapsed = __rdtsc() - tsc;
00007FFF3D398D0C rdtsc
00007FFF3D398D0E shl rdx,20h
00007FFF3D398D12 or rax,rdx
00007FFF3D398D15 mov rbx,rax
00007FFF3D398D18 sub rbx,r9
const __m128 res = …
00007FFF3D398D1B lea rdx,[rcx+98h]
00007FFF3D398D22 mov rcx,r10
00007FFF3D398D25 call computeSomethingExpensive (07FFF3D393E50h)
What’s the best way to fix?
P.S. I’m aware rdtsc
doesn’t count cycles, it measures time based on CPU’s base frequency. I’m OK with that, I still want to measure that number.
Update: godbolt link