3

Valgrind is changing the values returned by the CPUID opcode instruction. Simply put, how can I make Valgrind respect the actual CPUID instruction?

For reference, this was discovered when running into strange errors when detecting aes-ni support on an old computer which I know does not have the aes-ni instruction set. This behavior, however, is clearly changing multiple values.

This behavior can be observed with valgrind-3.10.1, using the following C code:

#include <stdio.h>

int main() {
        unsigned eax, ebx, ecx, edx;
        eax = 1;
        __asm__ volatile("cpuid"
                : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
                :  "0" (eax),  "2" (ecx)
        );
        if(ecx & (1<<25)) {
                printf("aes-ni enabled (ecx=%08x)n", ecx);
        } else {
                printf("no aes-ni support (ecx=%08x)\n", ecx);
        }
        return 1;
}

Which compiles and runs as such:

$ gcc -o test test.c
$ ./test
no aes-ni support (ecx=0098e3fd)
$ valgrind ./test
==25361== Memcheck, a memory error detector
==25361== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==25361== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==25361== Command: ./test
==25361==
aes-ni enabled (ecx=0298e3ff)
==25361==
==25361== HEAP SUMMARY:
==25361==     in use at exit: 0 bytes in 0 blocks
==25361==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==25361==
==25361== All heap blocks were freed -- no leaks are possible
==25361==
==25361== For counts of detected and suppressed errors, rerun with: -v
==25361== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Note that the same binary returns 0098e3fd normally, but 0298e3ff under valgrind, which is wrong!

ks1322
  • 33,961
  • 14
  • 109
  • 164
cegfault
  • 6,442
  • 3
  • 27
  • 49
  • Possibly relevant info: https://sourceforge.net/p/valgrind/mailman/message/31960632/ – Joe Feb 09 '18 at 01:55
  • @cegfault, do note that valgrind is a CPU emulator at heart. Unlike some profiling tools, it reads and executes the machine code, collecting data (counting instructions rather than time). I'm not sure you'd be able to test CPU specific code on valgrind, but I may be wrong. – Myst Feb 09 '18 at 02:26
  • 1
    @Myst yes, Valgrind is basically a virtual interface (can't find memory leaks without virtualizing and controlling memory libraries). That said, if Valgrind's virtual system is hard-coded, then it can't be used to test fall-back code (as with aes-ni; I have aesni intrinsics for systems that support it, with raw C-code as fallback). I can't test that fallback code for memory leaks without rewriting how that code is called, and I'd be disappointed if Valgrind required that. – cegfault Feb 09 '18 at 03:11
  • Besides, it's precisely *because* Valgrind is virtual that I'd assume it's feedback and flags could be modified. It's easier to modify that (especially to *disable* a feature) than to modify a bare-metal setup. – cegfault Feb 09 '18 at 03:12
  • @cegfault I'm not saying it can't be done... I'm no valgrind expert... however, I would consider (as a workaround) compiling the code with a compilation flag that forces the fallback code. This will allow you to test the code no matter the CPU you're running under. I know it's a nuisance, but it's the only idea I have that could help. – Myst Feb 09 '18 at 03:15
  • I think that might be my only option, unfortunately :( – cegfault Feb 09 '18 at 14:52

2 Answers2

5

After a couple days with no answers, it would appear Valgrind is incapable of allowing a correct CPUID response.

Because Valgrind is, essentially, running inside a virtual environment, it will respond CPUID information about the virtual processor it's aware of, and not the system's processor.

Thanks to a comment by @Joe the following link shows a conversation about this dating back to 2014: https://sourceforge.net/p/valgrind/mailman/message/31960632/

In short, it would be nice for Valgrind to have an option to set CPUID flags as a runtime flag (as was suggested in the linked thread), but to date (February 2018) no such flag exists.

cegfault
  • 6,442
  • 3
  • 27
  • 49
0

I am tired of endless discussions about what is right and wrong for options and philosophy. I downloaded the code, and modified guest_amd64_helpers.c and in all instances of the emulation of cpuid I changed the code to read to a local array the real cpuid using the instruction and replaced the first and second SET_ABCD with the values of the local array, like this:

    unsigned int intelId[8];
    __get_cpuid(0, intelId     , intelId + 1, intelId + 2, intelId + 3);
    __get_cpuid(1, intelId + 4 , intelId + 5, intelId + 6, intelId + 7);
    switch (0xFFFFFFFF & st->guest_RAX) {
    case 0x00000000:
     SET_ABCD(intelId[0], intelId[1], intelId[2], intelId[3]);
//         SET_ABCD(0x00000001, 0x68747541, 0x444d4163, 0x69746e65);
     break;
  case 0x00000001:
     SET_ABCD(intelId[4], intelId[5], intelId[6], intelId[7]);
//         SET_ABCD(0x00000f5a, 0x01000800, 0x00000000, 0x078bfbff);
     break;

don't forget of course at the top of guest_amd64_helpers.c to include <cpuid.h>