Questions tagged [rdtsc]

RDTSC is the x86 read time stamp counter instruction.

RDTSC is the x86 read time stamp counter instruction often used for high resolution timing.

See How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures.

Get CPU cycle count? has info on various caveats of using it: on modern x86, it measures reference cycles, not actual core clock cycles. (And also shows how to access it from C++.)

The earliest CPUs to support RDTSC had fixed clock frequency, and some OSes found it was more useful as a low-overhead time source time-of-day functions, so CPU vendors eventually changed it to be how it is now: a fixed-frequency nonstop counter.

It can be out-of-sync across different cores. (Some CPUs avoid that for cores in the same physical package.)

137 questions
8
votes
0 answers

What's up with the "half fence" behavior of rdtscp?

For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a counter that increments at a fixed frequency…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
7
votes
4 answers

What's the equivalent of rdtsc opcode for PPC?

I have an assembly program that has the following code. This code compiles fine for a intel processor. But, when I use a PPC (cross)compiler, I get an error that the opcode is not recognized. I am trying to find if there is an equivalent opcode for…
Rob
  • 71
  • 2
7
votes
2 answers

Calculate system time using rdtsc

Suppose all the cores in my CPU have same frequency, technically I can synchronize system time and time stamp counter pairs for each core every millisecond or so. Then based on the current core I'm running with, I can take the current rdtsc value…
e271p314
  • 3,841
  • 7
  • 36
  • 61
7
votes
1 answer

How to calculate the frequency of CPU cores

I am trying to use RDTSC but it seems like my approach may be wrong to get the core speed: #include "stdafx.h" #include #include #include using namespace std; struct Core { int CoreNumber; }; static void…
Alexandru
  • 12,264
  • 17
  • 113
  • 208
6
votes
2 answers

Why is RDTSC a virtualized instruction on modern processors?

I am studying RDTSC and learning about how it is virtualized for the purposes of virtual machines like VirtualBox and VMWare. Why did Intel/AMD go to all the trouble of virtualizing this instruction? I feel like it can be easily simulated with a…
Robert Martin
  • 16,759
  • 15
  • 61
  • 87
5
votes
2 answers

Clang optimizes out RDTSC asm blocks thinking the repeated block yields the same as the previous block. Is this legal?

Supposed we have some repetitions of the same asm that contains RDTSC such as volatile size_t tick1; asm ( "rdtsc\n" // Returns the time in EDX:EAX. "shl $32, %%rdx\n" // Shift the upper bits left. "or %%rdx,…
sandthorn
  • 2,770
  • 1
  • 15
  • 59
5
votes
1 answer

Does RDTSCP increment monotonically across multi-cores?

I'm confused whether rdtscp monotonically increments in a multi-core environment. According to the document: __rdtscp, rdtscp seems a processor-based instruction and can prevent reordering of instructions around the call. The processor…
stickers
  • 83
  • 1
  • 6
5
votes
1 answer

Better than 100ns resolution timers in Windows

I work on programming language profiler and I am looking for a timer solution for Windows with better than 100 ns resolution. QueryPerformanceCounter should be an answer, but the returned frequency by QueryPerformanceFrequency is 10 MHz on Windows…
mvorisek
  • 3,290
  • 2
  • 18
  • 53
5
votes
1 answer

Is there any difference in between (rdtsc + lfence + rdtsc) and (rdtsc + rdtscp) in measuring execution time?

As far as I know, the main difference in runtime ordering in a processor with respect to rdtsc and rdtscp instruction is that whether the execution waits until all previous instructions are executed locally. In other words, it means lfence + rdtsc…
ruach
  • 1,369
  • 11
  • 21
5
votes
1 answer

"rdtsc": "=a" (a0), "=d" (d0) what does this do?

I'm new to C++ and benchmarking I don't understand what the this part of the code does? So I found something about the edx, eax registers, but I don't fully understand how that plays into the code. So I understand this code essentially returns the…
Manjari S
  • 51
  • 6
5
votes
1 answer

Can different processes run RDTSC at the same time?

Can different processes run RDTSC at the same time? Or is this a resource that only one core can operate on at the same time? TSC is in every core (at least you can adjust it separately for every core), so it should be possible. But what about Hyper…
kuga
  • 1,483
  • 1
  • 17
  • 38
5
votes
1 answer

RDTSCP in NASM always returns the same value (timing a single instruction)

I am using RDTSC and RDTSCP in NASM to measure machine cycles for various assembly language instructions to help in optimization. I read "How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures" by Gabriele…
RTC222
  • 2,025
  • 1
  • 20
  • 53
5
votes
1 answer

_mm_lfence() time overhead is non deterministic?

I am trying to determine time needed to read an element to make sure it's a cache hit or a cache miss. for reading to be in order I use _mm_lfence() function. I got unexpected results and after checking I saw that lfence function's overhead is not…
Ana Khorguani
  • 896
  • 4
  • 18
5
votes
2 answers

Why is CPUID + RDTSC unreliable?

I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUID+RDTSC here and here. In the above mentioned…
talekeDskobeDa
  • 372
  • 2
  • 13
5
votes
1 answer

cpuid + rdtsc and out-of-order execution

cpuid is used as a serializing instruction to prevent ooo execution when benchmarking, since the execution of benchmarked instructions might be reordered before rdtsc if it's used alone. My question is whether it is still possible for the…
ashen
  • 807
  • 9
  • 24
1
2
3
9 10