I'm looking into measuring benchmark performance using the time-stamp register (TSR) found in x86 CPUs. It's a useful register, since it measures in a monotonic unit of time which is immune to the clock speed changing. Very cool.
Here is an Intel document showing asm snippets for reliably benchmarking using the TSR, including using cpuid for pipeline synchronisation. See page 16:
To read the start time, it says (I annotated a bit):
__asm volatile (
"cpuid\n\t" // writes e[abcd]x
"rdtsc\n\t" // writes edx, eax
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t"
//
:"=r" (cycles_high), "=r" (cycles_low) // outputs
: // inputs
:"%rax", "%rbx", "%rcx", "%rdx"); // clobber
I'm wondering why scratch registers are used to take the values of edx
and eax
. Why not remove the movs and read the TSR value right out of edx
and eax
? Like this:
__asm volatile(
"cpuid\n\t"
"rdtsc\n\t"
//
: "=d" (cycles_high), "=a" (cycles_low) // outputs
: // inputs
: "%rbx", "%rcx"); // clobber
By doing this, you save two registers, reducing the likelihood of the C compiler needing to spill.
Am I right? Or those MOVs are somehow strategic?
(I agree that you do need scratch registers to read the stop time, as in that scenario the order of the instructions is reversed: you have rdtscp, ..., cpuid. The cpuid instruction destroys the result of rdtscp).
Thanks