11

I would like to program and read the hardware performance counters offered on all recent x86 hardware.

On Linux there are the various perf_events systems to do this (and the perf utility to do it from outside an unmodified program).

Is there any such built-in facility in Windows? If no built-in facility exists, the second best would be another approach perhaps using third-party code, but that doesn't require me to get a driver signed.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Doing a quick google search: https://www.youtube.com/watch?v=c1In6NbJt5E&t=2581s – ifconfig Aug 01 '17 at 03:43
  • 1
    https://msdn.microsoft.com/en-us/library/windows/desktop/aa373214(v=vs.85).aspx – Michael Petch Aug 01 '17 at 03:43
  • @MichaelPetch - I saw it, but it seems they are not talking about "hardware PMU counters" but higher level performance counters that report mostly OS and framework-level stuff tracked in software. I would like to proven wrong, however! – BeeOnRope Aug 01 '17 at 03:45
  • 1
    @BeeOnRope I read the question too quickly otherwise I would have posted this: https://msdn.microsoft.com/en-us/library/windows/desktop/dd796399(v=vs.85).aspx – Michael Petch Aug 01 '17 at 03:47
  • @MichaelPetch - I don't think it really works (unless you are a corporation of sufficient size). From that page: _To profile hardware performance counters, you need a driver to configure the counters._ Even supposing I am willing to write such a driver, and pay for a code signing certificate, the required certs are not issued to individuals. – BeeOnRope Aug 01 '17 at 03:57
  • https://learn.microsoft.com/en-us/windows-hardware/drivers/install/how-to-test-sign-a-driver-package – Michael Petch Aug 01 '17 at 04:24
  • @MichaelPetch - yup, but it would require all users to restart their boxes into "test mode" which is somewhat unlikely in my case. I'm aware of the driver approach and have used it successfully on by local box, but I'm looking for built-in functionality. – BeeOnRope Aug 01 '17 at 04:26
  • You didn't mention in your question this was for end users. I assumed you were doing it personally. The answer is yes the facilities exist to do it. Your question gives no other limitations besides being run from user mode. – Michael Petch Aug 01 '17 at 04:28
  • It's for open source software, so I can only hope it will have at least one user other than myself! I think it is more or less implied that when someone asks "is there an API to do X in Windows" an answer which involves "write your own kernel module" is out of scope, _especially_ given the financial and verification hurdles of that task in Windows. With enough RE I could create *any* API with a kernel module! @MichaelPetch – BeeOnRope Aug 01 '17 at 04:46
  • Can you use WinRing0.sys – harold Aug 01 '17 at 15:10
  • @harold - perhaps? I found various versions though a Google, like [this one](https://github.com/openhardwaremonitor/openhardwaremonitor/blob/master/Hardware/WinRing0x64.sys) and I have successfully used a file by the same name to read perf counters before, but the provenance, license and safety of this file isn't clear to me. Various people seem to have copied it but I don't find any "origin story". – BeeOnRope Aug 01 '17 at 17:54
  • The info on the file itself says it's from OpenLibSys.org – harold Aug 01 '17 at 17:57
  • Latest known source of WinRing0 is available here: https://github.com/QCute/WinRing0 note that using this is contradictory with your "from a user-mode" since this is a kernel driver. I don't think there is another way anyway than using a kernel driver. – Simon Mourier Aug 04 '17 at 08:19
  • @Simon - once you load the driver it enables user-mode rdpmc instructions. So you need the kernel driver flip the CR4.PCE bit which allows `rdpmc` and also to allow counter programing, but then you can do your reads in user mode. I'd still be very interested in any method that doesn't use a kernel driver at all, but it's better than nothing (mostly it is acceptable because someone else has already jumped through the signing hoops). – BeeOnRope Aug 04 '17 at 15:14
  • If you go that route, the answer to "can I do from user mode possibly using a custom kernel driver" is always "yes". – Simon Mourier Aug 04 '17 at 17:38
  • @simon Obviously not. There are all sorts of things you simply can't do from ring 3 because the hardware doesn't allow it. Other things you can _always_ do from ring 3, such as most plain instructions, and there are a handful of things like `rdpmc` that fall in the middle which may or may not be allowed depending on the values of various control registers. The reads are fully in "user mode" when this is enabled, but you may need help from the kernel to enable it. – BeeOnRope Aug 04 '17 at 18:01
  • In any case, I'm not that interested in arguing semantics, but rather with taking a practical approach (perhaps I should add some kind of generic disclaimer to every question)? I want it to work. Ideally, it doesn't use a separate kernel driver at all (this is the case for user mode perf reads on Linux, for example). Failing that, the presence of a signed driver with a reasonable license and source would be the next best. Failing that, simply having source for such a driver but no signed version would be pretty bad but better than nothing at all. Makes sense? – BeeOnRope Aug 04 '17 at 18:05

1 Answers1

10

Short answer

No, there's no built-in facility in Windows. Also the linux perf command doesn't work on the Linux Subsystem for Windows 10.

Long answer

To get access to those counters, you'll need a combination of these instructions:

Unfortunately these instructions can only be called from kernel mode, so you'll need to interface with a driver. While writing the driver code itself is easy, getting the driver signed is not that easy (especially as you mentioned you want to do this as an individual).

That's why I advise you to look into existing projects like Open Hardware Monitor and the pcm project by Intel.

Open Hardware Monitor

This open-source project is written in C# and includes binaries and C source-code of a WinRing0.sys (32-bit) / WinRing0x64.sys (64-bit) driver developed by OpenLibSys.org. If you want to use this driver in your project, you only need to include their copyright notice.

PCM

This open-source project is written in C++ and also contains source for a similar driver (see WinMSRDriver directory), but you have to build it yourself so you'll turn into the signing problem again.

Anyway, wanted to mention this project because it probably contains a lot of code which might be of your interest.

User-Mode access

Now, once you have that driver loaded (Open Hardware Monitor extracts and loads the driver automatically on start of the application which is pretty neat), you can start calling those driver IOCTL's by using the Windows API functions CreateFile / DeviceIoControl and of course CloseHandle from your user-mode application.

huysentruitw
  • 27,376
  • 9
  • 90
  • 133
  • Does it work seamlessly even from inside a VM-encapsulated O/S -- VmWare et al? – user3666197 Aug 03 '17 at 21:06
  • 1
    VMs often don't support all counters, so you may see a set of supported counters inconsistent with the reported processor family @user3666197 – harold Aug 03 '17 at 21:17
  • 2
    Excellent. Somehow I had overlooked that `LICENSE` file. I have already gotten everything working with `WinRing0.sys` (more precisely, it's 64-bit counterpart, `WinRing0x64.sys`), so that's straightforward for me. My concern was more about the murky legal and source-availability situation with that driver, but if it's clean, it's clean. I wonder if we'll even see a new signed version, though... @WouterHuysentruit – BeeOnRope Aug 03 '17 at 21:35
  • 1
    @BeeOnRope in the last few days I have successfully created the application + driver that reads all performance counters on kabylake / skylake on Windows 7 while benchmarking a masm application output from ml64; it looks like linux perf stat. I will post it as an answer soon – Lewis Kelsey Apr 16 '21 at 11:29
  • @LewisKelsey - this is awesome, any plans to open source it? Does it require you to put Windows in dev mode to run the driver? – BeeOnRope Apr 16 '21 at 22:15
  • @BeeOnRope right now you have to press f8 and boot with driver signature enforcement disabled and I've only tested it on windows 7. One thing actually is that some of the counters that appear in for instance the Broadwell table on Vol 3B and don't appear in the Kaby Lake table work on my Kaby Lake CPU. I haven't read anything on what this represents. I program 4 counters at a time, run a benchmark and then program the next 4 counters. The 4 rdpmcs are right before calling the entry point of the .exe to benchmark and show currently 1m – 1.01m uops for a 1m uop loop – Lewis Kelsey Apr 16 '21 at 22:25
  • I think the extra uops might be due to a clock interrupt happening because that's the only interrupt I didn't disable on the core during the benchmark – Lewis Kelsey Apr 16 '21 at 22:29
  • It looks like [this](https://imgur.com/zWHXD71) for a loop of dec jnz which is macrofused and in the uop cache. But remember the latter 3 counters are part of a different run to the former 4, which is annoying – Lewis Kelsey Apr 16 '21 at 23:21