Detect if processor has RDTSCP at compile time

Question

Some new Intel processors have both RDTSCand RDTSCP instructions while most older processors have only RDTSC instruction.

While coding in C/C++, how I can detect at compile time if the architecture being used have RDTSCP instruction or not?

I know we can check this out manually by browsing CPU info (e.g., cat /proc/cpuinfo) and then adjusting our code. But getting this information at compile time (as a macro or flag value) would really omit the need to manually checking and editing the code.

A thought for you: will your code always be run on the same machine, or could it be run on different machines, some of which may have the RDTSCP instruction and others which may not have the RDTSCP instruction? — Jonathan Leffler, Sep 02 '15 at 15:47
I'm skeptical there is a compiler macro, but `cpuid` would always return the correct value. — Jason, Sep 02 '15 at 15:52
As Jonathan has pointed out, you need to compile both versions, and pick one when your program starts, because presence of RDTSCP is a feature of the runtime environment, not the compile environment. — Ben Voigt, Sep 02 '15 at 15:56
@JonathanLeffler: Yes, my code is expected to run on various machines (some with `RDTSCP` and others with `RDTSC` only). For machines with without `RDTSCP`, I am only depending on `RDTSC`. I feel this very time-consuming to check cpuinfo and edit it manually every-time. — user1082170, Sep 02 '15 at 15:56
If done at program startup, once, it won't be all that time consuming (you won't notice it compared to the cost of the dynamic loader loading the shared libraries, for example). If it is done every time you need to do some timing (within a single run of the program), then it's a more serious problem — but it is a problem largely fixable by code design. At worst, if you run the RDTSCP version and find that the CPU doesn't have the instruction, you re-execute the RDTSC version of the program instead (nasty, but doable). More likely, you self-configure — a function pointer to the code or … — Jonathan Leffler, Sep 02 '15 at 16:00
@Jason Can (and how) `cpuid` provide if the architecture has `RDTSCP` instruction? Does it set some specific bit in some register for `RDTSCP`? — user1082170, Sep 02 '15 at 16:04
@Junaid - according to [this document](http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf) from Intel, it should be a flag listed in the `/proc/cpuinfo` (see Chap 2.1 - Introduction). Doesn't answer your question exactly about how to use `cpuid` (I don't know that, unfortunately). — tonysdg, Sep 02 '15 at 16:10
Yes, there's a `cpuid` leaf (80000001H) for processor features you can use. Normally, it's called using assembly, but I think there's a compiler intrinsic for it. — Jason, Sep 02 '15 at 16:18
I'm not sure what you're using both for though. You can get the majority of the functionality of `rdtscp` using `rdtsc` and an `lfence`. — Jason, Sep 02 '15 at 16:29
@Jason Thanks for your pointers. Yes, [this document](http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf) (as pointed out by @tonysdg too) provides the exact reasoning behind me using the `RDTSCP`+ `RDSTC` instructions. — user1082170, Sep 03 '15 at 08:50
Just a heads up, `cpuid` is an expensive instruction so you may want to cache the value in a global at startup. — Jason, Sep 03 '15 at 15:02
@Jason: Yes, you are right. I am putting this in a global after only calling at the start of the program. — user1082170, Sep 03 '15 at 15:08
@Junaid Hello, what build system do you use? Can't you just figure that out during configuration of the build and set the appropriate macro to the compiler flags? — Zaffy, Feb 25 '19 at 20:05

Hadi Brais · Accepted Answer · 2020-01-18T13:26:32.457

GCC defines many macros to determine at compile-time whether a particular feature is supported by the microarchitecture specified using -march. You can find the full list in the source code here. It's clear that GCC does not define such a macro for RDTSCP (or even RDTSC for that matter). The processors that support RDTSCP are listed in: What is the gcc cpu-type that includes support for RDTSCP?.

So you can make your own (potentially incomplete) list microarchitectures that support RDTSCP. Then write a build script that checks the argument passed to -march and see if it is in the list. If it is, then define a macro such as __RDTSCP__ and use it in your code. I presume that even if your list is incomplete, this should not compromise the correctness of your code.

Unfortunately, the Intel datasheets do not seem to specify whether a particular processor supports RDTSCP even though they discuss other features such as AVX2.

One potential problem here is that there is no guarantee that every single processor that implements a particular microarchitecture, such as Skylake, supports RDTSCP. I'm not aware of such exceptions though.

Related: What is the gcc cpu-type that includes support for RDTSCP?.

To determine RDTSCP support at run-time, the following code can be used on compilers supporting GNU extensions (GCC, clang, ICC), on any x86 OS. cpuid.h comes with the compiler, not the OS.

#include <cpuid.h>

int rdtscp_supported(void) {
    unsigned a, b, c, d;
    if (__get_cpuid(0x80000001, &a, &b, &c, &d) && (d & (1<<27)))
    {
        // RDTSCP is supported.
        return 1;
    }
    else
    {
        // RDTSCP is not supported.
        return 0;
    }
}

__get_cpuid() runs CPUID twice: once to check max level, once with the specified leaf value. It returns false if the requested level isn't even available, that's why it's part of a && expression. You probably don't want to use this every time before rdtscp, just as an initializer for a variable unless it's just a simple one-off program. See it on the Godbolt compiler explorer.

For MSVC, see How to detect rdtscp support in Visual C++? for code using its intrinsic.

For some CPU features that GCC does know about, you can use __builtin_cpu_supports to check a feature bitmap that's initialized early in startup.

// unfortunately no equivalent for RDTSCP
int sse42_supported() {
    return __builtin_cpu_supports("sse4.2");
}

If you're building with `-march=native`, you can detect whether the host supports RDTSCP by looking at `/proc/cpuinfo` if the host is Linux. e.g. `grep -l '^flags[[:space:]]*:.*rdtscp' /proc/cpuinfo` [CMake test for processor feature](//stackoverflow.com/a/54817625) — Peter Cordes, Feb 22 '19 at 02:40
@PeterCordes Note that the value of `EDX` is undefined if `__get_cpuid` returned zero (which indicates no support for the specified CPUID leaf). I think we should either initialize d to zero or include `__get_cpuid` in the if expression as it was before the edit. — Hadi Brais, Feb 26 '19 at 04:04
I had already noticed that and changed the code back to how you had it in the Godbolt link, but forgot to copy the final version back into the answer! Sorry about that. — Peter Cordes, Feb 26 '19 at 04:15

score 2 · Answer 2 · edited Feb 26 '19 at 04:13

2

Editor's note: https://gcc.gnu.org/wiki/DontUseInlineAsm. This answer for a long time was unsafe, and later edited to not even compile while still being unsafe (clobbering RAX making the "a" constraint unsatisfiable, while still missing clobbers on registers that CPUID writes). Use the intrinsics in another answer. (But I've fixed the inline asm in this to be safe and correct, in case anyone does copy/paste it, or wants to learn how to use constraints and clobbers properly.)

After investigating a little more based on the suggestions made by @Jason, I have now a run-time solution (still not a compile-time one) to determine if RDTSCP exists by checking the 28th bit (see output bitmap) of the cpuid instruction with 0x80000001 as input in EAX.

int if_rdtscp() {
    unsigned int edx;
    unsigned int eax = 0x80000001;
#ifdef __GNUC__              // GNU extended asm supported
    __asm__ (     // doesn't need to be volatile: same EAX input -> same outputs
     "CPUID\n\t"
    : "+a" (eax),         // CPUID writes EAX, but we can't declare a clobber on an input-only operand.
      "=d" (edx)
    : // no read-only inputs
    : "ecx", "ebx");      // CPUID writes E[ABCD]X, declare clobbers

    // a clobber on ECX covers the whole RCX, so this code is safe in 64-bit mode but is portable to either.

#else // Non-gcc/g++ compilers.
    // To-do when needed
#endif
    return (edx >> 27) & 0x1;
}

If this doesn't work in 32-bit PIC code because of the EBX clobber, then 1. stop using 32-bit PIC because it's inefficient vs. 64-bit PIC or vs. -fno-pie -no-pie executables. 2. get a newer GCC that allows EBX clobbers even in 32-bit PIC code, emitting extra instructions to save/restore EBX or whatever is needed. 3. use the intrinsics version (which should work around this for you).

For now I am fine with GNU compilers, but if somebody need do this under MSVC, then is an intrinsic way to check this as explained here.

edited Feb 26 '19 at 04:13

Peter Cordes

328,167
45
605
847

answered Sep 03 '15 at 11:57

user1082170

323
2
13

1

You can actually simplify the code a bit. With embedded assembly you can tie variables to specific registers. In the input section, you can add `"a"(eax)`, and for output `"=d"(edx)`. Then you can eliminate the `mov`s and the `ecx` variable if you want. If you do get rid of the `ecx`, just make sure it's in the list of clobbered registers. – Jason Sep 03 '15 at 15:15
This gives error `error: ‘asm’ operand has impossible constraints`. The pre-edit version doesn't give the error. – haelix Nov 01 '18 at 21:09
@haelix: that's because it uses `"a"` as an input and `"rax"` as a clobber. You don't need inline asm, use `__get_cpuid()` instead. [How do I call "cpuid" in Linux?](//stackoverflow.com/q/14266772) – Peter Cordes Feb 20 '19 at 17:31
@PeterCordes thanks, but I'm not seasoned with assembly and it would take a bit of effort to grasp the change I need to do. Would it be possible for someone to amend the present Answer into a ready-to-use solution? – haelix Feb 21 '19 at 14:55
I run into a very odd issue with this code. I put it inside the c-tor of a (C++) static variable and the program hangs at startup (keeps using more and more memory). I am unable to debug it. Never seem something quite like it. Any ideas of what could cause it? – haelix Feb 21 '19 at 16:16
@haelix: what's "this"? The code in this answer doesn't compile. Did you remove the "rax" clobber, making it unsafe by clobbering RAX without telling the compiler? That could certainly lead to an infinite loop over init functions, maybe repeatedly calling a `std::vector` or something. That's why you should call `__get_cpuid` instead of using inline asm at all. – Peter Cordes Feb 21 '19 at 16:58
@PeterCordes by "this" I mean I am using the pre-edit version (v1), because like you say, the latest version of the aswer (v2) doesn't compile. Please can you advise? – haelix Feb 21 '19 at 17:30
@haelix: never use that, it's broken and should be expected to cause exactly the kind of breakage you're seeing. Oh, I just looked. That and this are both missing an RBX clobber, not RAX. That makes even more sense; RBX is call-preserved so it's normally used for loop variables in loops that call functions. You should downvote this answer because it's broken. It has the right CPUID feature bit, but the inline asm is dangerous. – Peter Cordes Feb 21 '19 at 17:31
@PeterCordes OK understood, but please can you instruct where/how exactly to "call `__get_cpuid`"? I'm good with C++ but unfortunately I have no understanding of assembly or built-ins/intrinsics and such. I remember looking for `cpuid` once without success. Would you perhaps be able to post a corrected answer? – haelix Feb 21 '19 at 17:33
@PeterCordes - I have actually started a bounty for this. Will happily award 50 rep if you fix the current answer. – haelix Feb 21 '19 at 17:40
While the above answer is not correct as is, it is a good start. 1. checking at _compile_ time is not possible (no pre-processor macro, no gcc target() notation, etc). Of course, nothing hinders to have a cmake/configure logic checking the availability ahead of compilation. 2. As said as well, this only holds if it's ensured, that the program is run on the same cpu as compiled. 3. The edited answer works for gcc & icc, and should work in ctors as well. 4. If there is a system-gcc, this will include the mentioned cpuid.h Header which does the ASM magic for you. Otherwise please see above. – Rainer Keller Feb 22 '19 at 16:08
1

@RainerKeller _The edited answer works_ - No, we have established that the edited answer doesn't compile. – haelix Feb 22 '19 at 21:41
@haelix -- sorry, i have re-edited the above code on Feb 22nd. The main issue is the missing clobber of rbx in 64-bit code (or ebx for 32-bit compilation). My edit will only appear after it has been peer-reviewed. Shall I include a completely new answer? – Rainer Keller Feb 23 '19 at 06:18
@rainer if you do that (while mentioning that you're building upon the existing answer), I'll award the bounty to you. It seems the original answerer is not around. Also the question itself is misleading, it says "compile-time" which should not ever be mentioned in this context - please say you're not proposing a solution to the compile-time problem, as one such solution cannot exist. Thanks – haelix Feb 24 '19 at 10:52
@RainerKeller: your edit doesn't show up as pending. And that's not the only problem, it's that eax needs to be a `"+a"` input/output, because you can't declare a clobber on a reg used as an operand. – Peter Cordes Feb 24 '19 at 20:49
1

@haelix: Hadi's answer now includes working code. It was correct before my edit, but I polished it up some. You should probably award it the bounty. – Peter Cordes Feb 26 '19 at 03:55
1

@RainerKeller: I fixed the asm in this answer. Tested on https://godbolt.org/z/jNhxZS. (That doesn't prove the clobbers are safe, because I didn't include any surrounding code that would try to use RAX afterwards, for example. But a `"+a"` read-write operand and then leaving the variable unused afterwards is IMO the best way to tell the compiler what's going on.) – Peter Cordes Feb 26 '19 at 04:10
@PeterCordes Hadi's answer is working for me (might have worked before your edit too, I did not try; also I didn't try your edit on _this_ anser). Awarding the bounty to Hadi's answer. – haelix Feb 26 '19 at 12:48

score -1 · Answer 3 · answered Sep 02 '15 at 16:31

-1

I've been trying to get something to work and have so far been unsuccessful but you might want to try looking down the SFINAE route: https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error

I thought there might be a slim chance I could inject the assembly into a lambda and cause this to fail if the instruction doesn't exist on the platform, or succeed if it does, however lambdas can't be used with decltype. If you can somehow feed the assembly code into a template parameter then it can be done, but I don't know whether that would be possible. SFINAE is really cool but can get your head in a spin very quickly.

If you're on *nix, another (possibly naïve and reasonably inelegant) way to do it would be to write a program that runs that assembly instruction and then catches a SIGILL and executes the version of the program without the special instructions.

But there must be a nicer way than this to do it, and I should think that looking at compiler-specific macros would be the way to do it.

Good Luck!

answered Sep 02 '15 at 16:31

stellarpower

332
3
13

I know a dirty way to achieve what I want. I could write a function which could do `fopen("/proc/cpuinfo","r")` and then read and parse the data to find `RDTSCP`. But I was looking for something more generic and independent of underlying operating system ... – user1082170 Sep 03 '15 at 08:26
Well, I'm getting somewhere with the SFINAE route, but I don't know if it will work in the end. g++ is allowing me to put non-existing instructions into an asm declaration in a lambda, as long as I don't call that lambda. However when I call it in the SFINAE class' argument list as a default parameter, it compiles fine. If I can modify it somehow so that it fails properly, then you will be able to use a static const bool to control whether you include the extra processor instructions, and you can switch on it at compile-time or runtime (as long as you don't have anything that won't compile.) – stellarpower Sep 03 '15 at 10:21
This can't possibly work on G++. It doesn't have a built-in assembler; it compiles to asm and *then* feeds that to the assembler. Also, the assembler doesn't respect the `-march=` ISA extension filter. Using the `__rdtscp()` intrinsic / builtin could have some chance of working, but GCC doesn't restrict `__rdtscp()` by `-march=` setting either. – Peter Cordes Feb 20 '19 at 20:28

score -2 · Answer 4 · answered Feb 20 '19 at 17:06

-2

Hello you can use the CPUID flag to check if it exist at compile time, for that you have to use 2 things, first guards like:

#ifdef __RDTSCP__
    // do things because it has the 
       function 
#else
    // do things if it doesn't have 
#endif

For last you have to compile the code using a flag in the gcc for example:

gcc x.c -o x.o -march=native

This gcc instruction will compile your code using the native functions of you cpu so it will define your CPUIDs.

answered Feb 20 '19 at 17:06

Mindkid

1
1

1

It would be nice if this worked, but none of gcc, clang, ICC, or MSVC define that macro. https://godbolt.org/z/at5ALR shows that all 4 compilers use the `#else` branch, with `-march=skylake`, `-march=znver1`, or `-march=knl`. (Or for MSVC, x86-64 + AVX2 because it doesn't target specific CPUs.) – Peter Cordes Feb 20 '19 at 17:36
Hello Peter I'ved answered like that because it worked for me. I'm doing a project that's must know if the processor has clflushopt function, if tit does use it, if it doesn't use clflush has the flush function... – Mindkid Feb 21 '19 at 18:12
I don't have access to the make file right now and the code. When i got them I can post here so you can test it out on your machine. I'ved tested it in 2 separated machines one having other doesn't having and it worked the CPUID checked out. If it has continues uses clflushopt has flush function, if it doesn't uses clflush. I'ved answered this question because I had the same problem and because I found a solution that worked for me I'ved shared so it can be helpful for you... I don't know the compiler site that emulates the compiler I'ved checked the gcc man page and use like I'ved mentioned. – Mindkid Feb 21 '19 at 18:18
Godbolt doesn't "emulate" gcc, it literally runs an actual install of gcc. Compilers do define `__CLFLUSHOPT__`, but not `__RDTSCP__`. So yes, this solution works for CLFLUSHOPT, but not for RDTSCP. Not all ISA extensions have their own CPP macro, in gcc/clang/ICC, unfortunately. You can't just make up macro names without checking that the compiler actually does define it or not depending on `-march` options. (Also, ICC19 `-march=skylake` or `-xHOST` doesn't define `__CLFLUSHOPT__`, so unfortunately even that's not fully portable :( – Peter Cordes Feb 21 '19 at 18:23
Hello Peter you got a point, and i'm sorry, i'ved assume that it worked because if you search on the intel intrisics you see that RDTSCPT it's the flag as such CLFLUSHOPT it's for the given function, and for the clflushopt i searched in the immintrin.h / clflushoptintrin.h for the flag of it, and it was __CLFLUSHOP__ so i'ved assumed that for the given function it was __RDTSCPT__. That was my assumption sorry.. – Mindkid Feb 21 '19 at 19:30

Detect if processor has RDTSCP at compile time

4 Answers4