CPUID on multiple cores/NUMA

Question

I am working on CPU detection and general environment detection code for my hobby OS. Is there ever a case where CPUID needs to be called multiple times? That is if the system has multiple cores, does the OS need to call CPUID on each core? Same for NUMA.

Both the AMD and Intel CPUID manuals are unclear on this. There is an article on the osdev wiki that mentions calling CPUID called Detecting CPU Topology, but to my reading was not clear as to when and how many times CPUID needs to be called.

I suppose one situation where CPUID might act unusually is AMD Fusion, which contains both a CPU and GPU on the same chip. You'd have to take a look at the documentation for more information. Other than that, I don't see why any internal CPU core would show a different CPUID than another core. — Polynomial, Dec 19 '11 at 22:20
Besides the given answer, another reason to call CPUID multiple times is when using the rdtsc instruction for performance measurements. You typically use cpuid before it since cpuid is a 'serializing' instruction and prevents pipelining, but cpuid also has the nasty habit of taking longer to execute the first few times it's called (according to old intel manual on rdtsc), so it's typical to call it a few times at startup to make sure it's sped up, then use it before all your rdtsc calls. — Joseph Garvin, Sep 02 '12 at 18:51

score 3 · Accepted Answer · answered Dec 25 '11 at 18:24

3

Since it's been almost a week and nobody has been able to answer this (probably because of the holidays), I'll attempt to answer this anyway.

I think the answer is yes. You may need to call CPUID on each core. One reason for this is that not all (even x86) systems today are homogeneous.

For example, I've read on an overclocking forum (I can't find the link) that it's possible to mix two different processor models on some dual-socket server boards. The person had a dual-socket 1366 system with two different speed processors. (and different model #s)

So in this case, calling CPUID will depend on which processor the thread was on - therefore you'll need to call it once each processor to get all the information.

In the manuals of one of my server-motherboards, it also states that you are allowed to mix processors of different models (with certain restrictions). And certainly, it's possible to mix two different steppings of the same processor model.

This reason alone (heterogeneous topology), is already reason to need to call CPUID on each core.

answered Dec 25 '11 at 18:24

Mysticial

464,885
45
335
332

Sorry for the long time responding, but yes I believe this is correct. Each CPU on a die needs to have CPUID information extracted and stored. – nixeagle Mar 22 '12 at 21:37
So what happens if you successfully call CPUID on every core of every CPU with different instruction sets; i.e. one CPU has SSSE3, another SSSE4.2. You then use this information to decide to enter a piece of code that uses SSE4.2 instructions. In the middle of executing this code the OS swaps out your thread, and next it's scheduled it is on the CPU only supporting up to SSSE3. Then the code crashes executing an SSE4.1 instruction. So it's not enough to even call CPUID on multiple cores, one must even set the thread affinity after doing so. Is this correct? – Apriori Mar 28 '14 at 17:58
@Apriori That sounds extremely unlikely. I'm aware of a single Intel or AMD x86 configuration that will let you install different CPUs from different generations with different instruction sets. But if you wanted to be excessively careful, yeah, you can do that. I think it's overkill though. – Mysticial Mar 28 '14 at 18:02
I thought that approach sounded a little paranoid, still it could be a "fun" crash dump to surface some day. All the explanations I've read describing how to use the CPUID instruction have not mentioned anything about the multi-proc/core scenario; but it seems like it may be a bigger problem than a most devs using x86 SIMD realize. Another thing I'm unclear on is weather it's best to call CPUID every time your code might branch, or once and cache the result. Agner Fog has CPUID as 100 to 250 cycles latency on Sandy Bridge. I digress, this may warrant a new question; which I'm happy to post. – Apriori Mar 28 '14 at 18:25
@Apriori Correction to my last comment. "I'm *not* aware". I somehow dropped that word. In short, I don't think it's possible to build a system where the processors have different instruction sets. I suppose it's possible inside a VM, but then that's just asking for it. – Mysticial Mar 28 '14 at 18:28
Thanks for calling that out. I was running out of characters in my last response to mention that I assumed you meant "you're not aware." I suppose though assuming one set of instruction sets per machine then if dispatching were the only thing the CPUID instruction is used for in your software then one would only have to call CPUID once per system not once per core; in this very specific case. – Apriori Mar 28 '14 at 18:38

CPUID on multiple cores/NUMA

1 Answers1