Why does it work to store CPUID results in an int array and pass to printf "%s"?

Question

#include <stdio.h>

int main() {
    int index = 0;
    int reg[3];
    __asm__(
        "cpuid \n\t"
        : "=b"(reg[0]), "=c"(reg[2]), "=d"(reg[1])
            // editor's note: CPUID also modifies EAX; this buggy code doesn't tell the compiler about it
        : "a"(index)); 
    printf("%s\n",&reg);
}

Why does int is treated as string.I need some help to explain me why is it like that.

The `printf` call invokes *undefined behavior* - and your compiler should be warning you about it — UnholySheep, Apr 19 '20 at 18:27
Consult an instruction set reference. The `cpuid` packs the string into 3 registers. Note you should really use a fixed length format for that. — Jester, Apr 19 '20 at 18:27
@smartalec “prints the correct answer” doesn't mean that your code is correct. You can fix this e.g. using a union: `union { int reg[3]; char str[12]; } id;` then, use a fixed length format like `%12s` to work around the lack of NUL-termination. — fuz, Apr 19 '20 at 18:30
@smartalec And as Jester said: cpuid is specified to work like this. If you want to know why, ask Intel. I suppose it's an easy way to return a short string from an interface normally used to return bitmasks and integers. — fuz, Apr 19 '20 at 18:31
So these registers returns integer or what ? :) i m confused :) — smartalec, Apr 19 '20 at 18:45
Your code is buggy: you forgot to tell the compiler that your `asm` statement modifies EAX. Using `"+a"(index)` as an in/out operand would be an easy way to do that. — Peter Cordes, Apr 19 '20 at 19:06
@zwol Eric Postpischil's answer already addresses all relevant issues. Feel free to accept his. — fuz, Apr 19 '20 at 19:27

score 3 · Answer 1 · edited Apr 25 '20 at 00:37

When zero is given in eax, the cpuid instruction returns two things:

in eax¹, the maximum value supported for the input operand in eax for cpuid on this processor, and
in ebx, edx, and eax, the string “GenuineIntel”, distributed across those registers in that order. The bytes of the string are simply put in the registers, four bytes in each register.

The assembly code you have shown causes the GCC or Clang compiler to copy the latter registers to your reg array.

To print this array correctly, you could pass to printf:

a format string containing %.12s to print at most 12 characters, and
a char * that points to the first byte of reg.

For example:

printf("%.12s\n", (char *) reg);

(Note that this conversion to char * is specifically allowed by C aliasing rules: The bytes of any object may be accessed using a pointer-to-character type. Other pointer conversions, or uses of their results, are not always defined by the C standard. Pedantically, (char *) &reg may be needed, as it provides a pointer to the first byte of the entire array rather than a pointer to the first byte of reg[0]. A strict interpretation of the C standard could say that the latter pointer is not reliable for arithmetic beyond reg[0].)

Footnote 1: Modifying an input-only operand is a bug; the compiler will assume that EAX is unmodified across the asm statement. In this case, that could lead to calling printf with al != 0, even though there are no floating-point values in XMM registers. (x86-64 System V calling convention.) With other callers / surrounding code, the problems could be more serious.

Since you don't care about the value of index after your asm statement, a read/write "+a" operand is an easy way in this case to tell the compiler that EAX is also modified:

    int index = 0;
    int reg[3];
    __asm__(
        "cpuid"
        : "+a"(index), "=b"(reg[0]), "=c"(reg[2]), "=d"(reg[1])
        );

Why does it work to store CPUID results in an int array and pass to printf "%s"?

1 Answers1

Linked