Equivalent of mkl_get_clocks_frequency() for non-intel compilers

Question

I use _rdtsc() in Intel compilers to get time stamp counter. I use _rdtsc() in conjunction with mkl_get_clocks_frequency(), to convert time stamp counter readings to seconds. Both of them are specific to Intel compilers.

While, I have an equivalent of _rdtsc() on GNU compilers using inline assembly, I do not have the same for mkl_get_clocks_frequency().

How can I estimate CPU clock rate in a portable fashion?

Art · Answer 1 · 2015-04-20T08:28:00.950

I will give you a non-answer. Sorry, but as far as I know there is no good answer to this. RDTSC will only work on certain CPUs under very specific conditions returning values whose interpretation is somewhere between hard and impossible without the help of the operating system, therefore I suspect no one has bothered to implement support for that in portable compilers/libraries (all other expect the Intel compiler).

Here's the long story:

The RDTSC instruction has had a long history of semantic changes that are very hard to keep track of in an application. Older Intel and AMD CPUs only had the TSC count the internal cycles which meant that with variable frequency (power saving modes, etc.) the frequency could change without any notification to the application. The frequency could have changed multiple times between two timestamps and you had no way of knowing that this happened.

Some CPU or BIOS versions could suspend the TSC while in system management mode, while other didn't. The first behavior meant that TSC was useless for wall-clock time, the other meant that TSC was useless for benchmarking. Last time I was looking at this there was no way of detecting this other than comparing to a different clock and looking for large jumps.

Some CPUs didn't keep TSC and/or its frequency synchronized between multiple CPUs in the system. Which means that if the operating system moves your process between CPUs, the TSC value you read is in the best case totally useless and in the worst case subtly misleading.

Recent trend and stability promise has been to have a synchronized timer and synchronized static frequency (which you can't achieve because the clocks are sensitive to temperature, but that's another story). We can finally stably use RDTSC without problems.

But then Intel threw us another curveball by suddenly deciding that RDTSC is no longer a serializing instruction (it's most likely not a conscious decision, it's probably just a mistake that Intel is getting away with by saying "it was never documented to be serializing"). This means that if you read the timer twice in your code, the second value can be lower than the first value. Or even worse, most of the code you're benchmarking hasn't actually been run. The new RDTSCP instruction "solves" this problem, but you need to figure out which CPUs actually implement it, which ones have reliable enough RDTSC that you can use, and which ones you just have to give up and use a better time source.

To add to this, you don't know if your code is actually running between two calls to RDTSC or if you're context switched. Therefore I would suggest to stick to timing facilities that your operating system provides and measure the time that your process is running. Those timing facilities are slower, but the operating system has most likely solved all these problems much better than you'll ever be able to figure out. As a bonus if you're using NTP or some other time synchronizing mechanism you'll also get the clock frequencies much closer to real seconds because they also keep track of long and short term frequency drift that you as an application can not possibly know.

score 1 · Answer 2 · edited May 23 '17 at 11:43

You cannot do that portably, and if you did that, it is meaningless, as explained in Art's answer.

On Linux specifically, you might parse /proc/cpuinfo to get some information (which might be wrong, by the time you are parsing it), about some CPU frequencies. But that is still meaningless.

On Linux, you should read time(7) and practically use clock_gettime(2) which runs quickly, thanks to vdso(7) technology.

With a C++11 compliant compiler & implementation (i.e. libstdc++), you could use <chrono>

The POCO framework library (wrapping several OSes) has some timer support.

Equivalent of mkl_get_clocks_frequency() for non-intel compilers

2 Answers2